Subscribe

Slow FCP transfer speeds

[ Edited ]

We are seeing slow FCP transfer speeds (65 MBps) over 4 Gbps fiber lines. Watching the server and the filer during the transfer, neither one is maxing out it's resources. We have tried different setups to eliminate possible bottlenecks: running through a switch, direct connect between filer port 0b and a blade, direct connect between filer port 0b and a server. In all cases we still see slow fiber speeds. Have updated HBA drivers, set speed settings to 4 Gbps, adjusted queue depth on the Windows server, etc to no avail.

Have an open ticket with NetApp Support ( # 2000136708) but that seems to be going nowhere.

Anyone else seen the same or similiar results with regards to transfer speeds?

Re: Slow FCP transfer speeds

Yes, I have. There are limitations as to how much you can push through a single thread. For example, a mere copying of a file is not a very good test because it is a single-threaded process. We have a tool calle SIO (Simulated IO) which allows you to craft the exact traffic pattern you want.

For a good test, I recommend you increase the number of threads that your host will 'inflict' upon the NetApp controller, then go from there. Read through the documentation for SIO and download it:

NetApp Simulated IO Tool:

http://now.netapp.com/NOW/download/tools/sio_ntap

Let us know how this goes.

Re: Slow FCP transfer speeds

Excellent response...we often find the same results over any protocol where sio can push a much higher throughput with multiple threads. Programmers at some of our customers have changed code to use more/less threads and block sizes based on sio what-if analysis. It's one of my favorite tools on now.

Some other things that I would check too (not likely the issue here at all...but adds to the discussion what you can look for generally with FCP performance troubleshooting...if you are the one who set up everything you'd know if these were set or not)...but anything that can be causing slower performance without having seen the configuration, I would check

1) priority settings... is flexshare setup?

2) is igroup throttling enabled?

3) queue depth setting on the host (with one copy process it won't be an issue..but could be when running multiple threads...we sometimes have to increase the number of queues from the hba utility at customers.. when we find we can't push performance (no bottleneck on host or fas controller we see, but the host runs out of queues)...

4) Switch QoS or a vendor with a blocking architecture on an oversubscribed port (you direct connected as well so not the case here for the one thread operation...but potentially could be with more threads)

Re: Slow FCP transfer speeds

Yes all good responses however the problems still remains. I have to deal with large SQL DB queries, data transfers, etc so large single threaded operations are a fact of life. We have tried multi-threaded operations and we do get a boost (from 65 MBps to around 90 MBps). Still far short of what you would expect from a 4 Gbps fiber line.

Question still remains: why can't we get faster speeds. I can understand maybe we need to tweak settings to get that last 5-10% of speed increase but when you're only at 16% of the rated speed there's a problem. From my point of view, 4 Gbps FCP is a well established technology. Therefore I should be able to just set it up and, right from the start, get 60 - 70% of the speed. Then I should have to do tweaking to get the extra speed increase. Not the case here.

Question back to the readers: What speeds have you seen and what settings are you using?

One answer that came up in researching this is changing block size. However, in my opinion, this just slants the fiber line towards that particular data type, in this case SQL. However my filer is used for more than just SQL so then I would be penalizing the other traffic.

Anyway more thoughts and ideas?

Thanks for the feedback.

Re: Slow FCP transfer speeds

Until you have unacceptable latency on the NetApp controller, this is not the place the start making any settings. You need to find a way to get the host to establish more throughput to the controller.

So is this a host bottleneck? Or is this a NetApp controller bottleneck?

A quick test to confirm or dismiss the controller is to monitor the FCP traffic latency while you run SIO as follows:

  • lun stats -i-o

Your avarage latency should be reasonable (what's 'reasonable' can vary) and you should not be seeing any partner ops.

So what are you seeing?

Re: Slow FCP transfer speeds

It might be a good idea to open a case on this...then you can perfstat, lun stat and measure performance after making changes they recommend. Most changes will be from the best practice guides for SQL on media.netapp.com (tech reports are public..search for sql). Recommendations in the reports include; changing minra and no_atime_update volume settings, increasing hba queue depth, sql params (affinity mask, memory, worker threads, etc.)..but follow the recommendations of support. GSC has helps quite a bit with performance cases..including other sql best practices from the host as well (location tempdb, system dbases, etc.)

Re: Slow FCP transfer speeds

Yes I have opened a support case and sent in the results of a perfstat.

I also ran a lun stats -o -i 1 <lun path> for both of the luns (lun being read from and the lun being written to).

Results are in the attached Excel file. Would be interested in the latency times and queue depth readings.

Re: Slow FCP transfer speeds

Looking at the lun stats I can see both the read & write ops are low. You have not said what disks you are using.

Each disk in the aggregate can do about

60 ops - SATA

120 ops - 300Gb 10k FC

160 ops - 300Gb 15k FC

etc

The number of ops is increased with the number of disks used by WAFL. {Parity and DP disk not included}

The lun stat also shows the queue length raising ~ Above 2 bad in my book.

During the test is CPU high? What about disk utilization, above 70%? {sysstat -m}

I have also found performance starts to drops off when the aggregate is about 80% and above 90%... oh dear.

Re: Slow FCP transfer speeds

The disks are 300 GB 10k FC.

We're using 13 disks in our DP RAID and there's 52 disks in the aggregate. Aggregate is 79% full.

Summary of a data transfer using sysstat -b -s 1 shows:

Summary Statistics ( 61 samples 1.0 secs/sample)
CPU FCP iSCSI Partner Total FCP kB/s iSCSI kB/s Partner kB/s Disk kB/s CP CP Disk
in out in out in out read write time ty util
Min
20% 456 41 0 977 12029 16945 24 0 0 0 25510 0 0% * 12%
Avg
41% 1031 693 4 1846 32412 29936 2266 6079 24 0 41635 51304 74% * 35%
Max
80% 1527 6668 30 7768 53596 44085 69241 60573 186 8 101453 124016 100% * 95%

Sry the cut and paste didn't come out well. But as you can see Disk util averages 35 % well the CPU is 41%

Interesting comment about the aggregate fill level being a factor. First I've heard about that. Do you have a KB or some other reference you can send me for further reading?

Re: Slow FCP transfer speeds

Running SQL on a Windows 2003 host, when you have slow IO with a single thread to one LUN, then you may want to look at the Storport LUN queue in addition to the HBA Queue depth. Which HBA is it?