I was wondering if anyone out there has any suggestions on how to improve FCP performance. It seems right now the most i'm able to get is around 10-20 mb/s over a 4g pipe. To me that just seems ridiculous.
I did not get to set up these aggr, they were already provisioned this way when I came in.
Data Ontap 8.0.2
All Raid DP
Aggr0 - 20 disks, 15K FCAL
Aggr1 - 13 disks, 7200 ATA
Aggr1 - 20 disks, 15k FCAL
Dell M1000e blade chassis
2 Brocade FC switches Teamed
16 Dell M610 esx servers running VMWare 4.1.
Under load, CPU utilization is around 40%, ops/s stay between 450 and 750, I/O writes around 80mb/s and reads from 30-35 mb/s.
sysstat output shows FCP in and out never going over 10mb/s.
I've searched and searched and I can't find anything regarding normal FCP transfer speeds and what other people are averaging to compare what i'm seeing with to even know if what i'm seeing is normal or not. But 10mb/s over a 4G pipe can't be right in my eyes. Any input would be appreciated!
It could be a lot of things, you should be getting something around 400 MB/s, I would start by checking the FC switches configuration, also, have you installed the FC Host Utilities on your ESX hosts? do you have any other workload using your filer?
As far as the switch config we don't really have anything set on it besides zoning. I'm not great with brocade, but based on the performance statistics of the switch it doesn't look to be having any issues or at least to be the bottle neck.
We have not installed the FC Host Utilities on our ESX servers, where can I find that?
*edit: Ok I found the FC Host Utilities, but it seems to similar functionality as the Netapp Virtual Storage Controller plugin. Are they the same or do you need both?
We are currently using the NVSC plugin.
There is no other workload on the filer. ESX/Vmware is the only thing using it.
How are you generating the workload? Have you tried multiple streams of data simultaneously? Are you using SIO or IOMeter with multiple threads/workers?
Is the filer doing anything else at the time?
Single thread read or write to a filer would be slow. Also, as mentioned by Pascal, small random IO is also going to be slower. I usually use SIO or IOMeter with large (64KB) sequential workloads just to check maximum throughput end to end. Try this from multiple VM's/ESX hosts at the same time. It's not realistic, and doesn't mean you will get this level in the real world, but it will show up any artificial bottlenecks along the data path.
When we were testing speed, we had a bunch of different VM transfers from different ESX hosts going. We have not used SIO yet, going to set that up and see what kind of info it gives us. We did recently figure out that we have misaligned VMs. So we are fixing that now, but i find it hard to believe that being the cause of our problem.
Misaligned VM's can make a significant difference, particularly if you're pushing disk IOPs. Are these VM's on the SATA aggr by any chance? With 100% random workload you're looking at 50-60 IOPS per disk (check statit output) and you'll be approaching 20ms latency on SATA. With misalignment you could be seeing up to 3 times the number of disk IOPs as a result, which will affect latency significantly if you have disks reaching higher levels of disk iops.
I'd suggest you fix your misalignment, then try again with a load generator using many threads across several ESX hosts/VM's. Worth googling Little's Law, and also take a look at Jason Ledbetter's doc here: https://forums.netapp.com/thread/25097