This is a multiple part question and we are also a new NetApp customer. We recently purchased a FAS2020 unit with 12 450GB SAS Hard Drives and two controllers, and two aggregates each setup with 6 spindles (2 for DP and 1 spare). We also purchased 3 new IBM servers w/ 48GB of ram each running VSphere4. We started converting our physical environment over to a virtual environment and immediatly noticed slowdowns, specifically with Microsoft Exchange (users have 10,000+ emails per folders and they came up almost instantly when our exchange was physically, but now we have high latency and would take 30-60 seconds to load). We figured we were spindle limited, but we also connected each controller to the same Gig switch (we have two gig switches, not sure of terminology but we originally set it up so if one switch was to fail, both controllers would stay online, but we wanted speed, so we gave up this redundancy) So we created a GIG LACP Trunk on each switch binding two 1Gbps ports and connected controller to each and we saw some improvements and things were working ok, not as fast as before but bearable. So now we were starting to run out of space and added a 14 drive shelf with the same 450gig Fiber hard drives, and extended each aggregate, but we also removed the LACP setup and went back to redundancy over network speed. We have an HP Procurve 3500yl and a Procurve 2910 switch and found out after the fact you cant create LACP trunks across each switch which would give us both speed and redundancy, if they were both 3500yl's we could though. Since removing the LACP and adding the shelf, things have gotton real slow again. Specifically we have noticed a 200Gig CIF share we use for employee files drags when files are launched.

It seems like we are stabbing in the dark and have invested a lot of money, and i am looking for some specific performance measures to see what our bottle neck really is. We plan to use this SAN to host an scanned patient chart application with 125 users simultanously hitting it (scanned in document retrieval, probally 10 million documents) The SQL database for this application, another SQL database for the Electronic medical records program with up to 80 simultanous users. Multiple CIF shares, and a full Citrix Platinum setup with 10+ XenApp servers that support 115 simultanous users. Our exchange environment and multiple other virtual servers (25 more)

So the two main questions are:

1) Can the FAS2020 handle all this ok?

2)What performance metrics can i use to see bottle neck

I'm afraid you have to less spindles for all the io that you need. What is the disk utilisation in sysstat -ux 1?

You can run statit to see how many busy disks you have. That would be the first think that I would check.

FAS2020 is not the high end system. You must start very easy and see how far you can go. Check CPU usage, write behavior (what is the frequency that the filer flushed his data to disk? (you can this also see in the sysstat)).

I don't think that the network throuputh is the problem.

here is a log running the command on each controller for about 15 seconds now, which is outside of business hours. let me know your thoughts..


15" is not so much. If this is outside business hours, than I can understand a little bit your problem.

The first controller looks fine (busy but no real problem). But on the second controller, there is a some write activity by iSCSI. The disk utilization at that moment is very high (+80%). CP-time takes more than 10 seconds. I can image that when you run this in day times, you will see this probably the whole time?

As I thought, you see that on the ethernet interface, your maximum use is less then 400 Mb. So whit 1 Gb you have enough.

I think that:

  • the only thing what (maybe) can help you is more spindles.
  • I have the impression that you system is not correct scaled for your operations. But for doing this correctly, you need a NetApp presales for helping you or your reseller.

I hope that this can help you a little bit.