About shaunjurr

shaunjurr · ‎2011-05-13

A reference link to the document and which page you are referring to would be very helpful, but if you are referring to TR-3749, then I don't seem to be able to view that at the moment. I just get an error saying that "the document is damaged and could not be repaired". Multi-pathing on ESX on FC hasn't been a very good experience for the most part. I haven't had to use it for iSCSI there but I assume the same rules apply. If you really want the advantages of multi-pathing, you should be using some sort of redundant network setup that segregates the network paths physically enough that single hardware failures will only affect one path at a time. Multi-connection will just add connections up to a limit after the initial connection has been established. Interesting choices for IP's in that example.

shaunjurr · ‎2011-05-13

Sorry to be pendantic as well, but "IP traffic is pretty much deteministic" ? How is that a refutation for what I wrote? That is hardly a specific sentence at all. Traffic between hosts with multiple interfaces in the same subnet have no deterministic source IP/interface for sending traffic, either for initiating or replying. (There is an option on NetApps that lets you sort of "force" the NetApp to reply from the interface where traffic entered) This is where so many "newbies" make the mistake of thinking that things will be better if I just add more IP addresses. Because the host with multiple IP's in a single subnet (netmask) can answer from or initiate traffic from any IP address (the only criteria the OS cares about is routing) within that subnet, it breaks things in "mysterious" ways for firewalls and things like NFS exports, just to name a few. That tons of linux admins add the addition firewall hacks of source routing just obfuscates the problem. It is not a viable solution for MPIO (and a lot of other things) because you don't really know which interface a host is initiating or replying from and if one of the interfaces on the server or the netapp goes down, then you may or may not (again non-deterministic) get the path failover that you want, not to mention that I don't even think you can setup MPIO without 2 vlans/interfaces on the Windows side. So, no town legend, just the facts.

shaunjurr · ‎2011-05-12

Hi, Multi-pathing is just that: multiple paths. The point here being a sort of emulation of typical FC SAN networks. There is no good way to segregate traffic on the same subnet. Operating systems don't have deterministic connections between multiple IP's on the same subnet. Since all are of equal value from a routing standpoint, the traffic can leave and enter any interface it wants. MPIO requires at least 2 vlans, at most 2 (or more) separate physical networks. Configuration of MCS for Windows is explained on page 42 of the users guide for the iSCSI software initiator. You can get a copy here: iSCSI Users Guide

shaunjurr · ‎2011-05-12

Hi, What is telling you that you are writing at 270MB/s? How do you arrive at that number? 'dd' is probably not the best tool in the world for testing I/O either. IIRC, you are using linux and it might just be lying to you. You probably would get more for you time by reading up on recommended mount settings for NFS using linux.

shaunjurr · ‎2011-05-12

Hi, Is this a CIFS "user" area or some NFS application? This might be a bit off-topic, but anything creating 100k links seems to be a bit broken. I can't help but wonder if alternative mounting methods (DFS/AMD, etc...) might get you out of your situation (assuming the behavior of the people/software can't be changed to do things differently) faster than fixing ONTap. Hard/Soft links can be a nice shortcut for many situations, but they will always cause overhead and even on normal UFS filesystems, they eat tons of inodes and obfuscate problem situations terribly, not to mention backup and restore situations where links don't get backed up or get backed up many times, depending on configuration or brokenness of the app/user. File lookups in filesystems with high numbers of files will generally benefit from directory substructures using a number of directories that is a prime number because of the benefits it gives with hash table lookups. Anyway, I digress. I think you probably are going to break things most anywhere with 100k links, but that's just my 2 cents.

shaunjurr · ‎2011-05-12

You are going to get much more out of your systems if you take a moment to read the system documentation. Set up a NOW account and read the manpages (manual pages) here: http://now.netapp.com/NOW/knowledge/docs/ontap/rel7351/html/ontap/cmdref/index.htm or use the CLI and type 'man ifconfig'

shaunjurr · ‎2011-05-12

Hi, You have to remember that "multi-session" and "multi-path" are two different things. Multiple sessions can be done pretty much with any link combination, although you will hit it's limits faster on single links. Multi-path requires at least 2 different subnets (best if isolated logically and physically). Ideally, you would use at least two physical interfaces on different nic's on the filer if you really wanted redundancy. Using vlans helps you get around the need for physical interfaces, but will not give you the same level of physical redundancy and probably not the same level of bandwidth as "port aggregation" or "etherchannel" load balancing is relatively deterministic and has no idea of "load" per-se, just an 8-member per MAC/IP connection round-robin algorithm. Setting up multi-session is just a matter of configuring it on the Windows host.

shaunjurr · ‎2011-05-12

Hi, Basically, your file sizes are too small and basically it's all just zeros you are reading anyway. The system has basically just read ahead enough in the file that it is probably caching almost all of your 4 files. If you really want a test, then your data set has to be a bit more random (more random data in the files) and a good deal larger and you should be rebooting the server and the filer between runs. I'm guessing a sysstat -x will show you very little disk activity. Reads will almost always be faster than writes anyway.

shaunjurr · ‎2011-05-12

Hi, If you want some more detailed analysis, then the TR's 3801, 3832 and this working paper WP-7107 give a pretty good view of things. You can gain some added performance configurability by using FlexShare (a.k.a 'priority') to prioritize what gets higher caching priority as well. It will also tell you some of the advantages of de-dupe and flash cache/PAM. Good Luck.

shaunjurr · ‎2011-05-10

"Asked and answered" The cluster will work in failover mode until some failure stops it from working.

shaunjurr · ‎2011-05-10

You probably want to paste a link that is available for non-NetApp and non-partner viewing. The TR's aren't so hard to find anyway if one just goes to www.netapp.com and searches in the Library tab for Technical Reports and just enters the TR number. 😉

shaunjurr · ‎2011-05-10

Hi, Basically, your instincts were totally correct. Just resize the destination to be a little larger than the source (in blocks). You won't see the volume actually change size visibly (that is without the -b option on 'vol size') until the next snapmirror update is completed. It is unfortunate that you got so much incorrect help in this thread. I've administered up to 600 active snapmirror sessions and re-sync'ing is rarely necessary, even when moving volumes. Try to check with the docs next time, but basically, if you understand a little of how snapmirror works with snapshots, things get a lot easier.

shaunjurr · ‎2011-05-10

Hi, Most of this is unnecessary. Just make sure that the size of the volume snapmirror is larger (in blocks) and just update the mirror. If, for example, the source needs to go from 30GB to 35GB, then just resize the destination to 36GB. You'll get a warning that you won't see the change because it is read-only, but this is just to tell you that the results won't be immediate. When the update is finished, the destination size will look like it is the same size as the source.

shaunjurr · ‎2011-05-10

Hi, The FAS270c was the first system to use disk ownership. If your "B" controller doesn't see any disks it's probably because none are assigned. You will need to assign at least 2 disks (it'll probably complain about spares if you don't assign 3) to get a raid4 root aggregate and volume up. 'disk show' is your friend here.... at least to see if you have disks assigned. Good luck.

shaunjurr · ‎2011-05-09

*lol* ... I did this for a customer once... from a FAS250 with a hacked HSSDC to fibre (+ 12V DC converter from the local electronics store) to a small Storage Tek tape library in stacker mode. Basically, the customer wouldn't listen. They had to go through the pain of burning out the fibre converter a few times and a few impossible restores before they finally broke down and did backup in a way that assured verifiable results and restores... Basically, it's just a bad idea, but I'm guessing your customer is just as stubborn. I guess you can be happy that you don't have HSSDC connections on your filer. The bad part is that you basically also need one tape library per filer as well. Not exactly a good way to save money because it just doesn't scale. If you have any chance, get them to pay for services rendered and run as far away as you can, hehe. Good luck.

shaunjurr · ‎2011-05-08

Hi, Additional Pros: 1. Since you can prioritize I/O on a per volume basis with FlexShare (a.k.a. priority command) it is perhaps an advantage not to mix more than one LUN per volume unless they are the same "class" of I/O. 2. Related to other snapshot operations: snapmirror operations to secondary storage can be differentiated a bit more on a per-lun basis. Moving luns to different aggregates can be one on a per-lun basis Often enough, these things are just a matter of flexibility versus complexity. As long as you aren't going to need hundreds of volumes (depending on the unit you have), one lun per volume is probably not a bad thing. Read up on the Best Practices for setting up storage for vmware. It can be a lot different than you think if you have used other storage systems. Strategies like joining/concatting lots of small luns is pretty pointless on a NetApp. Good luck.

shaunjurr · ‎2011-05-08

Hi, It seems like you don't have your volumes exported "rw" and "root" to your NFS clients/VMWare hosts. Read the exportfs manpage to see what the exported mounting rights are... 'exportfs -c ...'

shaunjurr · ‎2011-05-08

Hi, Your challenge is a familiar one because of how the different protocols achieve redundance. NFSv3 can't do multiple networks, so Ethernet protocols are used to get redundancy. iSCSI has Multi-pathing which offers both redundancy and increased bandwidth. With MPIO you need to have 2 subnets. Since you already have your 10GE links setup for Ethernet redundancy, you don't have enough nic's left to setup a sort of "IP-SAN" connection redundancy where you might choose to have interfaces/vlans preferred to certain switches... where vif's are not really necessary. The only good solution is to have another pair of 10GE interfaces, but barring that, just add a second vlan to get 2 iSCSI subnets (easier to move them later as vlans probably than stuffing 2 subnets into one vlan) and put your iscsi traffic there. It works. You won't get traffic balanced perfectly but you get all of the advantages and perhaps a good basis to migrate things at a later date.

shaunjurr · ‎2011-05-06

Hi, If you aren't seeing any external I/O when these spikes are coming, then it is most likely an internal routine. If you have the chance to upgrade (the amount of effort is small, the risk low, and it might solve the problem without tons of investigation), then do that. You aren't doing any reallocation, which may or may not be a bad thing. Your LUN fragmentation will increase over time and the performance will decrease. You might want to get reading up on that too. 80% shouldn't really be a problem, at least not from decreased performance because of WAFL. Remember to reallocate your volumes when you add disks... Do try to squeeze in some reading on storage allocation.

shaunjurr · ‎2011-05-06

Hi, If you have a NetApp HA cluster, then incorrectly configured FC paths will cause you extra CPU load, yes. This is essentially when the hosts try to access LUN's that are on the NetApp "partner" controller. Again, run 'lun config_check -v ' on the cli. I don't know how much free space you have, so it is hard to comment on whether or not that is a problem. To check your reallocation schedules, just run 'reallocate status -v'. I've managed to fat-finger the interval based things to run minutes apart a few times before... instead of days apart. It seems you really need to familiarize yourself with the NOW website as well. All of the documentation is there. There is a ton of knowledge and help to be found there. I think what you are seeing is a wafl scanner that is running buggy. Even if I'm not a fan of pushing upgrades for everything, I was glad to get away from 7.2.x and over to 7.3.5.1 .

shaunjurr · ‎2011-05-06

Hi, I guess you could run a stats or perfstats run to try to pinpoint things a bit closer. I'd check your reallocation schedules one more time. You might also want to check 'lun config_check' to see if you are (in the case that you have a cluster... again... no details... ) to see if you have some passive paths in use. The last thing is to run through the list of fixed bugs. I ran 7.3.2P4 on a few boxes for a while. It's not the worst release, but 7.3.5.1 seems to be very good up to this point. You might (will probably) find some bugs on wafl scanners that are running too often. Then you can schedule an upgrade. Good luck.

shaunjurr · ‎2011-05-06

How about a few more technical details including ONTap version.

shaunjurr · ‎2011-05-06

Hi, You are probably running into limits because of fractional reserve and space reservation. Ideally, thin provisioning is going to save you lot of such headaches. 1. Create your volumes with guarantee=none 2. Turn off snapshot reserve. No need to reserve space, really. If you have enough space on the volumes, you can take snapshots. Easy. 3. Turn off snapshot schedule "0 0 0". 4. Create your LUN's with reservation=none 5. Setup vol autosize to increase your volume size or, alternatively/in combination with snap autodelete. Then I think you can create 16TB LUN's... Just always make sure you monitor the capacity of your aggregates. Keep them under 90%. This way you basically have one "fire" to put out instead of having to balance a bunch of different settings. There's a TR on Best Practices for Thin Provisioning as well. A good idea to read it. Good luck.

shaunjurr · ‎2011-05-05

Hi, I'm assuming that with "mappings" you are referring to some part of the igroup and lun mapping configuration. Was it just the lun maps? What ONTap version? The only migration problems that I have run onto are problems with the changing iscsi nodename with headswaps (you can set it in one of the priv modes, iirc) and that on certain upgrades, LUN serial numbers have changed. Having a recent ASUP mail will save you in a lot of the unexpected situations. Scripting known changes will save you time. Good luck.

shaunjurr · ‎2011-05-05

Hi, If these messages mean what they say, then the problem is just that the user can't reach the network... === MESSAGE === Could not connect to sakr-shelf1.sakrgroup on port 443 for protocol HTTPS. Unable to connect to the remote server A socket operation was attempted to an unreachable network 192.168.0.124:443 === DETAILS === Could not connect to sakr-shelf1.sakrgroup on port 443 for protocol HTTPS. Unable to connect to the remote server A socket operation was attempted to an unreachable network 192.168.0.124:443 This would be less of a filer problem than perhaps a firewall/routing problem. The client may, of course, be emitting relatively bogus messages.

Re: Multiple iscsi sessions with microsoft iscsi initiator

Re: Multiple iscsi sessions with microsoft iscsi initiator

Re: Multiple iscsi sessions with microsoft iscsi initiator

Re: Understanding transfer rates.

Re: maximum link limit of 100000

Re: changing gateway settings

Re: Multiple iscsi sessions with microsoft iscsi initiator

Re: Reads faster than writes?

Re: Flexcache stats analysis

Re: Cluster failover

Re: Lun Allignment

Re: How to resize a snapmirror volume?

Re: How to resize a snapmirror volume?

Re: 270c controller not booting, "No root volume found".

Re: Copying volumes to tape

Re: Pros and cons of having one LUN per volume?

Re: "I/O error occurred" when writing to NFS

Re: iSCSI configuration help

Re: FAS 3140 FC High CPU load ?

Re: FAS 3140 FC High CPU load ?

Re: FAS 3140 FC High CPU load ?

Re: FAS 3140 FC High CPU load ?

Re: LUN limitation on FS6080?

Re: Netapp Head swap iSCSI configuration

Re: Could not connect to ********** on port 443 for protocol HTTPS