Solved: FAS2020 Aggr design for vSphere environment

DODGEBALL · ‎2011-06-11

Hi,

I have recently purchased a FAS2020 with the following:

FAS2020A,12x300GB,Base,R5

DS14MK2 SHLF,14.0TB SATA,QS,R5

The purpose is to run VMs over iSCSI.

I have run basic setup and configured Active/Active clustering.

For the storage design, I believe I have the following options, but would appreciate any feedback on how others have configured their aggregates and volumes:

Design Option 1:

This creates new aggregates without changing the factory settings.

	Controller 1	Controller 2
aggr0	3 x 300GB disks R5 - container for root volume + 1 hot spare	3 x 300GB disks R5 - container for root volume + 1 hot spare
aggr1	Remaining 300GB disks RAID DP + 1 hot spare (shared?)
aggr2		13 x 1TB disks RAID DP + 1 hot spare

Design Option 2:

Due to the fact that 8 of 12 15K disks get used by the system after install, leaving around 2 disks of usable space from the 300GB 15K shelf, I have been considering the following:

	Controller 1	Controller 2
aggr0	7 x 300GB disks R5 - container for root volume + 1 hot spare	3 x 300GB disks R5 - container for root volume + 1 hot spare
aggr1		13 x 1TB disks RAID DP + 1 hot spare

Questions for each option:

Can the hot spare be shared by the aggregates for the same speed disk?

If Controller 1 goes offline, will Controller 2 take over the storage presentation?

Any other advice or design to consider?

shaunjurr · ‎2011-06-11

Hi,

It is a little hard to comment on the setup without knowing what you are going to use the SATA disks for.

Option 1 doesn't look good. Wasting 3 disks on such a small system just for the root volume.

I guess I would set up filer 1 with all 12 300GB SAS disker and run raid4 (I can already hear the NetApp "Bible-bangers moaning"). Make sure you set the raid size to 11 before you get going. You can convert a raid_dp aggr to raid4 just by setting the option 'aggr options aggr0 raidsize 11'

Setting raidgroup size to 11 saves you from accidently adding your only spare to the aggregate. It can always be changed later if you buy more disks. Basically, you are going to need the I/O so the more disks you can write to/read from, the better off you are.

Filer 2 can use the SATA disks for both root aggregate and data. Again, you don't have a lot of disks. Here raid4 would be useful as well but I think you will just end up with a small raidgroup size (which needs a hack to change) so no real win. Just run with raid_dp and perhaps set the raidgroup size to 13 to start with. That will prevent anyone from adding the last disk and leaving you with no spares.

Spares are per controller and per disk size.

A controller failure will result in a failover of functionality to the surviving controller. Assuming you make no configuration errors, you should not lose any functionality. You will, of course, have potentially reduced controller performance due to having both instances running on the same controller hardware.

You have a lot more to read about setting up your VMWare instances correctly and what sort of storage maintenance routines you need to do when using LUN's There is also potentially a lot of design planning to be done to match your datastores to result in optimal use of NetApp functionality and backup/restore SLAs. There are thousands of pages on VMWare usage with NetApp...or you hire in someone that has done this before.

Good luck

View solution in original post

shaunjurr · ‎2011-06-11

Hi,

It is a little hard to comment on the setup without knowing what you are going to use the SATA disks for.

Option 1 doesn't look good. Wasting 3 disks on such a small system just for the root volume.

I guess I would set up filer 1 with all 12 300GB SAS disker and run raid4 (I can already hear the NetApp "Bible-bangers moaning"). Make sure you set the raid size to 11 before you get going. You can convert a raid_dp aggr to raid4 just by setting the option 'aggr options aggr0 raidsize 11'

Setting raidgroup size to 11 saves you from accidently adding your only spare to the aggregate. It can always be changed later if you buy more disks. Basically, you are going to need the I/O so the more disks you can write to/read from, the better off you are.

Filer 2 can use the SATA disks for both root aggregate and data. Again, you don't have a lot of disks. Here raid4 would be useful as well but I think you will just end up with a small raidgroup size (which needs a hack to change) so no real win. Just run with raid_dp and perhaps set the raidgroup size to 13 to start with. That will prevent anyone from adding the last disk and leaving you with no spares.

Spares are per controller and per disk size.

A controller failure will result in a failover of functionality to the surviving controller. Assuming you make no configuration errors, you should not lose any functionality. You will, of course, have potentially reduced controller performance due to having both instances running on the same controller hardware.

You have a lot more to read about setting up your VMWare instances correctly and what sort of storage maintenance routines you need to do when using LUN's There is also potentially a lot of design planning to be done to match your datastores to result in optimal use of NetApp functionality and backup/restore SLAs. There are thousands of pages on VMWare usage with NetApp...or you hire in someone that has done this before.

Good luck

DODGEBALL · ‎2011-06-13

Shaun,

Thanks for taking the time to share your thoughts... it seems like a logical approach to me. I have configured as your suggest and failover is looking good.

atinivelli · ‎2011-06-14

DODGEBALL ha scritto:

The purpose is to run VMs over iSCSI.

may i ask you the reason of this? I know it's a never ending discussion but it seems quite clear now that NFS is performing better than iSCSI. And, last but not least, is easier to manage and more robust against network issues.

shaunjurr · ‎2011-06-14

Hi,

NFS is usually about 10% slower than iSCSI, at least if you read the tests that NetApp does. VMWare's NFS implementation is also still maturing and isn't the world's best, imho. If it's linked to linux NFS in any way, then this is probably understandable. iSCSI and FC SAN are really not pretty pictures either given their lack of true multi-pathing over many years.

The main element here is probably cost of implementation. NFS licenses are hugely expensive compared to iSCSI, so if you don't have a large unix environment or can get a good rebate, then NFS for VMWare just isn't feasible financially when weighed against iSCSI.

radek_kubka · ‎2011-06-14

Well, many people seem to love VMware on NFS and there are multiple reasons for that (beyond performance)

http://communities.netapp.com/message/8823#8823

Re commercial aspect - NFS license is actually free of charge on any newly purchased FAS2020 (so is CIFS).

Regards,
Radek

shaunjurr · ‎2011-06-14

I have VMWare setups running FC, iSCSI and NFS. The least problematic are frankly the FC setups. It's pretty much just set it up and let it run.

iSCSI can be a real PITA if you don't have your network people behind you. Then you have to really school your windows people on getting all the OS timeouts right... which should be the rule generally anyway, but iSCSI seems to be a bit less tolerant than NFS.

In large traditional IT companies, if you have SAN, then there is very little interest in setting up "storage networks" to run NFS with a high level of reliability and security.

The fact that NFS might be free for the junior boxes doesn't play much of a role for most of us. It's a market positioning strategy to get NetApp in the door (the first fix is always free, hehe). If the numbers crunch correctly when these people graduate to the 3000 series is a different cup of tea.

VMWare guys that have used NFS generally like it a lot, but I don't find that it makes much difference to me on the storage side. They VMWare guys usually abuse the storage enough that most of the advantages are lost.

atinivelli · ‎2011-06-14

shaunjurr ha scritto:

NFS is usually about 10% slower than iSCSI, at least if you read the tests that NetApp does. VMWare's NFS implementation is also still maturing and isn't the world's best, imho.

did you read this official doc http://media.netapp.com/documents/tr-3697.pdf ?

It shows that NFS, FCP, iSCSI have almost same performances but the iSCSI consumes more CPU resources than the other two protocols.

Giving that NFS support in ESXi is equal or better now in 4.1 than 3.5 (used in these testing)....

shaunjurr · ‎2011-06-14

Like I mentioned earlier, I've used all the protocols with VMWare and basically, it's a toss-up for me between iSCSI and NFS. The advantages of the one over the other are mitigated a lot by what your organization can deliver, not just what the VMWare or storage guys want. iSCSI does allow multi-pathing (which probably doesn't work quite right on VMWare) but is a definite advantage bandwidth-wise on the small systems with 1 GE interfaces where NFS is basically not going to balance evenly using vif's/ifgrp's.

The benchmarks are dated and CPU load isn't a problem unless you are CPU bound. CPU capacity is growing faster than disk I/O anyway, so the problems are probably going to come down to not enough disk before either your VMWare or filer cpu are red-lining, at least on the small units. Benchmarking on a 3070 with 84 disks is going to be a different world from a 2020 with 14 disks.

My experience is that NFS response time is higher, even if it tolerates it better than iSCSI. Both are problematic in large environments without dedicated networks. The people working with VMWare usually don't know enough not to just dump everything in one datastore so some of the advantages get lost anyway.

DODGEBALL · ‎2011-06-14

Initially iSCSI was used in the design, but following some initial testing (and this is by no means complete), we are seeing more performance from NFS in this particular deployment compared with iSCSI. The biggest comparison is that the Filer CPU hits a constant 100% for any copy operations executed at the VMware host level even when they are not so aggressive.

The result of this is revisiting the design and potentially running with NFS.

There are plans to put VCD into play, so flexibility and performance are important. iSSCI was the precursor to a fully fleged fibre channel outlay once the environment scaled up after management signs off the proof of concept.

Thanks again for you input.

radek_kubka · ‎2011-06-14

Yeah, so the story slowly unfolds...

As much as I personally like NFS, bear in mind following things:

1) CPU utilisation on the filer does not correspond directly to the actual performance, which is measured by latency in most cases; also CPU on any bigger box will perform much better than on the tiny FAS2020

2) 4Gbit (or 8Gbit) FC on your target system will perform better than 1Gbit iSCSI

So it is a tricky situation to choose the protocol of the target solution based on PoC setup, utilising a different box (and even different protocol).

My suggestion:

Make sure you have Unified Target Adaptors in your final config & a couple of Nexus 5000 switches - that way you can run the lot: 10Gbit FCoE and/or 10Gbit NFS and/or 10Gbit iSCSI

Regards,
Radek