Network and Storage Protocols

Cifs Performance Help

dstrebel
5,875 Views

We have noticed that are Cifs Performance has been poor. I setup a 100% read test from a couple clients with IOmeter and recieved the follwoing results

Total MBs per Second = 5

Average I/O response (ms) = 375

Maximum  I/O response (ms) = 1300

What tools do you use for troublshooting cifs performance?What stats should be looked at? Any help would be greatly appreciated.

10 REPLIES 10

radek_kubka
5,850 Views

Was your test set to 100% sequential read by any chance?

Did you try reallocating the volume containing file shares in question?

http://now.netapp.com/NOW/knowledge/docs/ontap/rel707/html/ontap/cmdref/man1/na_reallocate.1.html

How about free space in the aggregate - is it less than 20%? or less than 10%? (it may cause quite substantial performance degradation)

Regards,
Radek

dstrebel
5,850 Views

No the test was set to random and 50/50 with the same results.

We haven't done reallocation to my knowledge. Does the wafl scan command need to be done from the console or can it be done from ssh?

Are free space on the aggr is about 20%

dstrebel
5,850 Views

Here is my output from wafl scan measure_layout:

Fri Jul 17 11:55:43 CDT [k: wafl.scan.start:info]: Starting WAFL l          ayout measurement on volume vol1.

Fri Jul 17 11:55:43 CDT [k: wafl.scan.layout.advise.ino:info]: WAFL layout           ratio for volume vol1, inode 766417 is 2.00. A ratio of 1 is optimal. Based on y          our free space, 4.07 is expected.

I'm really not sure what the numbers mean.

radek_kubka
5,850 Views

According to my best knowledge it simply means that fragmentation is not an issue in your case (anything below 4 is good).

How about the connectivity from clients to the filer? There is a chance a network congestion / improper configuration is slowing down the performance.

davieb1969
5,850 Views

Hi, is flowcontrol enabled on the interfaces and switches?

DB

dstrebel
5,850 Views

Flow control is on the interfaces. I will have to check with another group to see if it's on the switches.

dan_keating
5,850 Views

David,

What system are you running?

How are the interfaces configured (single nic - vif - same subnet or routed)

Assuming flexvol on an aggregate - how many spindles and of what disk type?

External disk shelf? What type?

Is this the only workload on the aggregate or is it shared with other applications / processes?

During your IOMeter test - how about profiling the system workload a little with:> sysstat -x 1

This should give us an impression of what you're working with and general performance on the box. It's worth dumping some of the sysstat output here for people to look at.

NB - relating to the flow control issue - my understanding is that unless specified, the default flow control is set to advertise at full however, the operational flow control varies on an number of factors (autonegotiation etc). ifstat should give this value.

dstrebel
5,850 Views

1. 3050c

2. vif same sunet

3. 500Gb Sata DS14 14 Disk

4. Only work load on aggregate

5. Here is the output when running a 100% Read test with IOmeter

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s

in out read write read write age hit time ty util in out

5% 0 1154 0 1230 592 13514 732 0 0 0 42 98% 0% - 12% 0 76 0 0

5% 0 812 0 910 594 13105 248 32 0 0 42 99% 0% - 8% 0 98 0 0

5% 0 859 0 894 604 13272 276 0 0 0 42 99% 0% - 6% 0 35 0 0

5% 0 1143 0 1208 571 13227 164 0 0 0 42 99% 0% - 1% 0 65 0 0

6% 0 819 0 972 2392 14819 48 24 0 0 42 100% 0% - 5% 0 153 0 0

36% 0 733 0 856 2730 14283 48 0 0 0 42 100% 0% - 2% 0 123 0 0

7% 0 1914 0 1975 815 13263 252 8 0 0 42 99% 0% - 7% 0 61 0 0

5% 0 1090 0 1178 530 13175 180 24 0 0 42 99% 0% - 9% 0 88 0 0

7% 0 1934 0 2022 1311 14484 2140 0 0 0 42 98% 0% - 22% 0 88 0 0

9% 0 941 0 1134 3747 16596 3204 7692 0 0 42 100% 44% Tf 11% 0 193 0 0

6% 0 858 0 981 567 13035 1164 7652 0 0 42 100% 44% : 15% 0 123 0 0

5% 0 694 0 749 737 13013 24 8 0 0 42 100% 0% - 4% 0 55 0 0

5% 0 684 0 778 1295 13504 32 0 0 0 42 100% 0% - 1% 0 94 0 0

9% 0 2069 0 2354 4118 16927 636 24 0 0 42 99% 0% - 6% 0 285 0 0

5% 0 836 0 867 617 12932 48 0 0 0 42 100% 0% - 2% 0 31 0 0

7% 0 1886 0 2102 1081 14506 636 0 0 0 42 97% 0% - 8% 0 216 0 0

5% 0 810 0 904 912 13309 92 32 0 0 42 100% 0% - 2% 0 94 0 0

7% 0 1749 0 1787 771 16935 492 0 0 0 42 100% 0% - 12% 0 38 0 0

7% 0 1553 0 1643 939 17045 468 0 0 0 42 100% 0% - 13% 0 90 0 0

dan_keating
5,850 Views

Are there separate aggregates on different loops here (i.e. more disk other than the 1xds14 mk2 sata?) I'm seeing more IOPS than I actually expected and there is some iSCSI activity running in parallel. Assuming dual parity plus 1 spare = 11 data disks, I would be happy if the SATA shelf was pulling 1000 iops. Sysstat is displaying over 2000 iops at times without stressing the disk at all. Large cache hit plus low disk utilisation would lead me to expect low latency rather than the results you've been seeing. It's only chucking out 15MB per second. Hmmm.

If I were at the console, my next step would be:

-----------------------

* sysconfig -r
(to check the raid config)

* ifconfig -a
(for interface setup)

* vif status
(check vif status - surprise)

* ifstat -a
(looking for errors)

I would also be interested in the routing. I know you've mentioned that the traffic should be same subnet / non routed but I always like to take a look:
To make sense - if you're  running multiple vlans etc - you'll need to identify which one the CIFS traffic is supposed to be served on.
* route -sn

-----------------------

If you're happy to share this info, stick it in notepad and append it to the thread.

amiller_1
4,568 Views

I'd spend some time in Performance Advisor as well....one of my default first places to go for performance issues (as covers CPU, interfaces, disks, volumes, aggregates, etc.). You can get all this stuff via stats/statit but Performance Advisor is just much easier to wander through when you don't know exactly what you're looking for.

Also, can you temporarily license NFS and see if any difference? Basically just trying to think of a way to take a variable out of the picture... (i.e. the CIFS protocol and focus on network/backend disks/etc.)

Public