We are purchasing a NetApp in the near future, most likely a FAS2040 with the internal disks and an additional shelf. It will have two controllers and a cluster config. We're probably going to use fibre channel to attach to our ESX environment (although it could end up being iSCSI) and also do some file sharing over CIFS. Management doesn't like NFS. While I have a fair amount of experience administering NetApp, there are some things I am a little confused about from a design perspective. So I am going to run through my thinking on this based on my reading of NetApp documentation and ask you to let me know if you see any problems.
First, the aggregates. It looks like the way to do this to maximize space is to split the disks evenly between the two controllers, and have one aggregate per controller which will contain the root volume and all the volumes for data. If we go with 8.01 (which I think is supported for the FAS2040), these would have to be 32-bit aggregates. Does 8.01 make sense? I'm not sure we need it, but presumably it would be easier to upgrade from 8.01 to 8.x later than it would be to upgrade from 7.3.x.
We will create a separate flexvol for each VMware LUN. In order for dedupe to make sense for us to use, each flexvol will have a volume guarantee of none. Fractional reserve and snapshot reserve will be set to zero. Snapshots will not be scheduled automatically, and the NetApp will be configured to autodelete them. We will track the total usage in each aggregate in order to make sure we have enough disk (which will hopefully be a while!). LUNs should also be small enough that we don't run into the size limitations for dedupe (we've been using relatively small LUNs anyway). We'll set a snapshot reserve of 20% on the CIFS share(s) and use a schedule.
Each controller will be attached to both our SAN switches (depending on the number of free fiber ports each one has, anyway). In terms of initiator groups, I think we'll go with one. All ESX servers currently need to access the same set of LUNs, and I don't anticipate that changing. If it does I guess we could add additional initiator groups.
In terms of spreading the data across the controllers, I am a little confused. The VMware LUNs we're anticipating are about 4 times the size of the CIFS data. Does it make sense to put half of the LUNs on one controller and half on the other, and to split the CIFS data in a similar way? That seems to be the most logical way to do it.
Thanks, and let me know what you think.