Data Backup and Recovery Discussions

Slow FCP transfer speeds


We are seeing slow FCP transfer speeds (65 MBps) over 4 Gbps fiber lines. Watching the server and the filer during the transfer, neither one is maxing out it's resources. We have tried different setups to eliminate possible bottlenecks: running through a switch, direct connect between filer port 0b and a blade, direct connect between filer port 0b and a server. In all cases we still see slow fiber speeds. Have updated HBA drivers, set speed settings to 4 Gbps, adjusted queue depth on the Windows server, etc to no avail.

Have an open ticket with NetApp Support ( # 2000136708) but that seems to be going nowhere.

Anyone else seen the same or similiar results with regards to transfer speeds?


Re: Slow FCP transfer speeds


Yes, I have. There are limitations as to how much you can push through a single thread. For example, a mere copying of a file is not a very good test because it is a single-threaded process. We have a tool calle SIO (Simulated IO) which allows you to craft the exact traffic pattern you want.

For a good test, I recommend you increase the number of threads that your host will 'inflict' upon the NetApp controller, then go from there. Read through the documentation for SIO and download it:

NetApp Simulated IO Tool:

Let us know how this goes.

Re: Slow FCP transfer speeds


Excellent response...we often find the same results over any protocol where sio can push a much higher throughput with multiple threads. Programmers at some of our customers have changed code to use more/less threads and block sizes based on sio what-if analysis. It's one of my favorite tools on now.

Some other things that I would check too (not likely the issue here at all...but adds to the discussion what you can look for generally with FCP performance troubleshooting...if you are the one who set up everything you'd know if these were set or not)...but anything that can be causing slower performance without having seen the configuration, I would check

1) priority settings... is flexshare setup?

2) is igroup throttling enabled?

3) queue depth setting on the host (with one copy process it won't be an issue..but could be when running multiple threads...we sometimes have to increase the number of queues from the hba utility at customers.. when we find we can't push performance (no bottleneck on host or fas controller we see, but the host runs out of queues)...

4) Switch QoS or a vendor with a blocking architecture on an oversubscribed port (you direct connected as well so not the case here for the one thread operation...but potentially could be with more threads)

Re: Slow FCP transfer speeds


Yes all good responses however the problems still remains. I have to deal with large SQL DB queries, data transfers, etc so large single threaded operations are a fact of life. We have tried multi-threaded operations and we do get a boost (from 65 MBps to around 90 MBps). Still far short of what you would expect from a 4 Gbps fiber line.

Question still remains: why can't we get faster speeds. I can understand maybe we need to tweak settings to get that last 5-10% of speed increase but when you're only at 16% of the rated speed there's a problem. From my point of view, 4 Gbps FCP is a well established technology. Therefore I should be able to just set it up and, right from the start, get 60 - 70% of the speed. Then I should have to do tweaking to get the extra speed increase. Not the case here.

Question back to the readers: What speeds have you seen and what settings are you using?

One answer that came up in researching this is changing block size. However, in my opinion, this just slants the fiber line towards that particular data type, in this case SQL. However my filer is used for more than just SQL so then I would be penalizing the other traffic.

Anyway more thoughts and ideas?

Thanks for the feedback.

Re: Slow FCP transfer speeds


Until you have unacceptable latency on the NetApp controller, this is not the place the start making any settings. You need to find a way to get the host to establish more throughput to the controller.

So is this a host bottleneck? Or is this a NetApp controller bottleneck?

A quick test to confirm or dismiss the controller is to monitor the FCP traffic latency while you run SIO as follows:

  • lun stats -i-o

Your avarage latency should be reasonable (what's 'reasonable' can vary) and you should not be seeing any partner ops.

So what are you seeing?

Re: Slow FCP transfer speeds


It might be a good idea to open a case on this...then you can perfstat, lun stat and measure performance after making changes they recommend. Most changes will be from the best practice guides for SQL on (tech reports are for sql). Recommendations in the reports include; changing minra and no_atime_update volume settings, increasing hba queue depth, sql params (affinity mask, memory, worker threads, etc.)..but follow the recommendations of support. GSC has helps quite a bit with performance cases..including other sql best practices from the host as well (location tempdb, system dbases, etc.)

Re: Slow FCP transfer speeds


Yes I have opened a support case and sent in the results of a perfstat.

I also ran a lun stats -o -i 1 <lun path> for both of the luns (lun being read from and the lun being written to).

Results are in the attached Excel file. Would be interested in the latency times and queue depth readings.

Re: Slow FCP transfer speeds


Looking at the lun stats I can see both the read & write ops are low. You have not said what disks you are using.

Each disk in the aggregate can do about

60 ops - SATA

120 ops - 300Gb 10k FC

160 ops - 300Gb 15k FC


The number of ops is increased with the number of disks used by WAFL. {Parity and DP disk not included}

The lun stat also shows the queue length raising ~ Above 2 bad in my book.

During the test is CPU high? What about disk utilization, above 70%? {sysstat -m}

I have also found performance starts to drops off when the aggregate is about 80% and above 90%... oh dear.

Re: Slow FCP transfer speeds


The disks are 300 GB 10k FC.

We're using 13 disks in our DP RAID and there's 52 disks in the aggregate. Aggregate is 79% full.

Summary of a data transfer using sysstat -b -s 1 shows:

Summary Statistics ( 61 samples 1.0 secs/sample)
CPU FCP iSCSI Partner Total FCP kB/s iSCSI kB/s Partner kB/s Disk kB/s CP CP Disk
in out in out in out read write time ty util
20% 456 41 0 977 12029 16945 24 0 0 0 25510 0 0% * 12%
41% 1031 693 4 1846 32412 29936 2266 6079 24 0 41635 51304 74% * 35%
80% 1527 6668 30 7768 53596 44085 69241 60573 186 8 101453 124016 100% * 95%

Sry the cut and paste didn't come out well. But as you can see Disk util averages 35 % well the CPU is 41%

Interesting comment about the aggregate fill level being a factor. First I've heard about that. Do you have a KB or some other reference you can send me for further reading?

Re: Slow FCP transfer speeds


Running SQL on a Windows 2003 host, when you have slow IO with a single thread to one LUN, then you may want to look at the Storport LUN queue in addition to the HBA Queue depth. Which HBA is it?

Re: Slow FCP transfer speeds


Your system is spiking high but the ave times are not too bad. 61 sec is not a very big sample however. I have just read the Storport in W2K3 and now know more than is healthy. Just had a look on one of my SQL boxes and the Storport was installed by NetApp host kit.

I am thinking

52 disks /13 raid group = 4 plexs

loss two disks in each plex for parity

11 x 4 = 44 usable disks

180 IOps per disks

7920 IOps available in aggregate

4kb to each IOp {31Mb per second}

These numbers can be much higher (200 - 400) IOps for each disk but 180 is a good starting point. The cache also improves performance. I would say the throughput you are reports in not to bad for your system.

The aggregate capacity vs performance is from what we have seen here on our filers. WAFL writes to available empty blocks. The fuller the aggregate the less chance of striping a full raid group per OP. Have a look at the statit output to see RAID Statistics (per second) and the blocks per stripe size to see how you are doing.

Re: Slow FCP transfer speeds


The attach kit doesn't set the Storport LUN queues.

It works like this:

Qlogic - the storport LUN queue depth is set to 32 by default, although you can change it in the registry. See

Emulex - In HBA anywhere, if you select LUN queues, the value you set is the LUN queue length. If you do not, the LUN queue is the set value divided by 4.

In this scenario, it's almost identical to the one that caused kb26454 to exist. The application is SQL, and all your IO is going to just one LUN and you are seeing poor performance with no apparant disk bottleneck. In the end, raising the Storport LUN queue solved the issue. Because the queue was too small for the load, IO was queuing between Storport and the miniport driver which caused throttling.

Re: Slow FCP transfer speeds


kb26454 sounds like a good idea to me. I tryed raising the HBA queue from 32 to 128 when I was bench testing my SQL box before going live. I created load using MS SQLIO.exe but throughput stayed the same and the filer lunstat showed the queue length as 8+ which I took to be a bad thing, as we have several other servers connected to the SAN.

Would be keen to know if kb26454 works. I have emulex in my SQL box....

Re: Slow FCP transfer speeds


I can say firsthand that the kb works.

For Emulex, if you don't select LUN queues, then the LUN queue is 1/4 the default 32 HBA queue depth or 8. That's why you saw 8. Which filer are you using? How many hosts? Whith multiple hosts, add the queues and make sure they don't exceed the depth on the port on the Filer. If most of your IO goes over just a LUN or 2, then click it over to LUN queues and raise it up.

Re: Slow FCP transfer speeds


I tried adjusting the queue depth per kb26454 with no results. Don't know if I edited the registry correctly for this. Especially Step 4 (didn't understand what they meant).

Use the following procedure to change the queue depth parameter:

  1. Click Start, select Run, and open the REGEDIT/REGEDT32 program.

  2. Select HKEY_LOCAL_MACHINE and follow the tree structure down to the QLogic driver as follows: HKEY_LOCAL_MACHINE SYSTEM\CurrentControlSet\Services\ql2300\Parameters\Device .

  3. Double-click DriverParameter to edit the string.

    Example: DriverParameter REG_SZ qd=0x20

  4. Add "qd=" to the "Value data&colon;" field. If the string "qd=" does not exist, append it to end of the string with a semicolon.

    Example: ;qd=value

  5. Enter a value up to 254 (0xFE). The default value is 32 (0x20). The value given must be in hexadecimal format. Set this value to the queue depth value on the HBA.

The queue depth parameter can also be changed via the SANsurfer utility

My reg entry looked like this:

DriverParameterREG_SZqd=FE; qd=value (FE=254)

Also tried:

DriverParameter REG_SZ qd=FE

Server didn't crash but no increase in speed either. Also tried a queue depth of 128 with the same results.

Other item is the Note that refers to changing queue depth via SANsurfer. Have installed SANsurfer but can't find this option.

All tests are being done over a direct connect between the 3070 filer, port 0b and a Windows server. Both HBAs are rated for 4Gbps.

Re: Slow FCP transfer speeds


Is your motherboard bus also rated for 4Gbps? If so, is the HBA in the correct slot to achieve that?

Earn Rewards for Your Review!
GPI Review Banner
All Community Forums