Subscribe

Disk Utilization "Problem" / Performance Problem

hello all,

i do have a question regarding diskutilisation.

can it be possible, that i do have a ~92% Disk util when the CP type is "-" ? i think i do have some sort of performance problem, any ideas how to check this out?

i cant believe, that even with sata disks, the disk util is over 90% with just 4mb/sec disk read...

any comments are welcome,

kind regards

-andy

STORE> sysstat -x 1
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
  5%     0   726     0     726  2555  1371   2784     24     0     0     1   91%   0%  -   89%      0     0     0     0     0     0
  4%     0   755     0     755  1541  1136   3312      0     0     0     1   92%   0%  -   89%      0     0     0     0     0     0
  6%     0  1329     0    1334  3379  2069   3836      8     0     0     1   90%   0%  -   79%      0     5     0     0    74     0
  4%     0   637     0     637  2804  2179   3160     24     0     0     1   92%   0%  -   86%      0     0     0     0     0     0
  4%     0   587     0     587  2386  1241   2532      8     0     0     1   94%   0%  -   98%      0     0     0     0     0     0
  8%     0   381     0     381  2374  1063   5224  15120     0     0     6s  96%  45%  Tf  78%      0     0     0     0     0     0
  7%     0   473     0     473  2902   840   3020  20612     0     0     6s  98% 100%  :f 100%      0     0     0     0     0     0
  5%     0  1131     0    1133  3542  1371   2612    400     0     0     6s  92%  35%  :   70%      0     2     0     0    20     0
  7%     0  1746     0    1746  3874  1675   3572      0     0     0     6s  92%   0%  -   79%      0     0     0     0     0     0
  8%     0  2056     0    2056  5754  3006   4044     24     0     0     6s  95%   0%  -   83%      0     0     0     0     0     0
  6%     0  1527     0    1527  2912  2162   2360      0     0     0     6s  94%   0%  -   86%      0     0     0     0     0     0
  6%     0  1247     0    1265  3740  1341   2672      0     0     0     6s  94%   0%  -   96%      0    18     0     0    98     0
  6%     0  1215     0    1220  3250  1270   2676     32     0     0     6s  92%   0%  -   86%      0     5     0     0    61     0
  4%     0   850     0     850  1991   915   2260      0     0     0     6s  90%   0%  -   75%      0     0     0     0     0     0
  7%     0  1740     0    1740  3041  1246   2804      0     0     0    13s  92%   0%  -   80%      0     0     0     0     0     0
  3%     0   522     0     531  1726  1042   2340     24     0     0    16s  88%   0%  -   69%      7     0     0    12     0     0
  6%     0   783     0     804  5401  1456   3424      0     0     0     1   92%   0%  -   89%     17     0     0    21     0     0
10%     0   478     0     503  4229   919   5840  13072     0     0     1   95%  65%  Tf  98%     12     9     0    17    94     0
  9%     0   473     0     487  3290   945   2720  23148     0     0    31s  97% 100%  :f 100%     12     0     0    17     0     0
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
  6%     0   602     0     606  3196   729   2380  12576     0     0    31s  97%  89%  :  100%      0     0     0     0     0     0
10%     0  1291     0    1291 15950  3017   2680      0     0     0    31s  94%   0%  -  100%      0     0     0     0     0     0
  9%     0   977     0     977 13452  4553   4736     24     0     0    31s  96%   0%  -   92%      0     0     0     0     0     0
  6%     0   995     0     995  3923  2210   2356      8     0     0    31s  94%   0%  -   85%      0     0     0     0     0     0
  4%     0   575     0     583  1849  2948   3056      0     0     0    31s  93%   0%  -   96%      0     8     0     0   111     0
  5%     0   789     0     789  2316   742   2364     24     0     0    31s  94%   0%  -   91%      0     0     0     0     0     0
  4%     0   550     0     550  1604  1125   3004      0     0     0    31s  92%   0%  -   80%      0     0     0     0     0     0
  7%     0  1398     0    1398  2910  1358   2716      0     0     0    31s  94%   0%  -   87%      0     0     0     0     0     0

statit from the same timeframe:

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs

/aggr0/plex0/rg0:

0c.16              9   3.71    0.47   1.00 90842   2.94  15.14  1052   0.30   7.17   442   0.00   ....     .   0.00   ....     .

1b.17             11   3.86    0.47   1.00 126105   3.14  14.31  1170   0.25   2.20  1045   0.00   ....     .   0.00   ....     .

1b.18             35  35.52   33.62   1.24 14841   1.63  26.23   965   0.27  15.09   392   0.00   ....     .   0.00   ....     .

0c.25             78  35.15   33.47   1.13 64924   1.48  28.77  2195   0.20  16.75  1493   0.00   ....     .   0.00   ....     .

0c.24             34  33.96   32.26   1.13 17318   1.51  28.21  1007   0.20  17.00   257   0.00   ....     .   0.00   ....     .

1b.22             36  35.40   33.67   1.15 16802   1.51  28.25  1003   0.22  15.56   721   0.00   ....     .   0.00   ....     .

0c.21             35  34.98   33.27   1.16 17126   1.48  28.75   950   0.22  14.78   820   0.00   ....     .   0.00   ....     .

1b.28             77  34.93   33.02   1.13 66383   1.56  27.40  3447   0.35  10.21  8392   0.00   ....     .   0.00   ....     .

1b.23             32  33.02   31.12   1.17 14775   1.53  27.65  1018   0.37  10.80  1321   0.00   ....     .   0.00   ....     .

0c.20             35  34.41   32.38   1.29 15053   1.66  25.73   976   0.37   9.67  1076   0.00   ....     .   0.00   ....     .

0c.19             34  34.80   33.07   1.20 15961   1.51  28.30   930   0.22  15.00   681   0.00   ....     .   0.00   ....     .

1b.26             76  34.41   32.41   1.05 68532   1.63  26.09  3482   0.37  11.93  7698   0.00   ....     .   0.00   ....     .

1b.27             36  35.15   33.32   1.26 15327   1.56  27.35  1018   0.27  12.82  1170   0.00   ....     .   0.00   ....     .

/aggr0/plex0/rg1:

0c.29              5   2.00    0.00   ....     .   1.63  27.89  1023   0.37   9.80   231   0.00   ....     .   0.00   ....     .

0c.33              5   2.03    0.00   ....     .   1.68  27.13  1095   0.35   8.21   330   0.00   ....     .   0.00   ....     .

0c.34             32  34.46   32.75   1.19 14272   1.51  29.87   927   0.20  16.63   617   0.00   ....     .   0.00   ....     .

0c.35             31  32.85   31.00   1.15 14457   1.51  29.87   895   0.35  12.36  1075   0.00   ....     .   0.00   ....     .

0c.41             32  33.10   31.44   1.20 13396   1.51  29.87   930   0.15  21.83   618   0.00   ....     .   0.00   ....     .

0c.43             31  32.73   30.92   1.19 13827   1.58  28.47  1005   0.22  15.22   920   0.00   ....     .   0.00   ....     .

0c.44             31  32.65   31.02   1.11 14986   1.51  29.85   913   0.12  26.00   408   0.00   ....     .   0.00   ....     .

1b.32             31  32.68   30.87   1.13 14437   1.58  28.48   956   0.22  15.78   627   0.00   ....     .   0.00   ....     .

1b.36             32  34.70   32.95   1.13 14680   1.56  28.94   975   0.20  16.75   582   0.00   ....     .   0.00   ....     .

1b.37             31  32.43   30.70   1.21 13836   1.51  29.89   929   0.22  14.78   797   0.00   ....     .   0.00   ....     .

Re: Disk Utilization "Problem" / Performance Problem

Just keep in mind that sysstat shows only the most busiest disk in "disk" and not an average.

So by the statit output it should be possible to narrow a bit what is going on...but I have to look up, what the output tells us exactly....

From the CP it is clear, that the system is not very busy with writes: they occur every 10s, which is one the triggers to write a checkpoint (type T). So does a larger disk write also occur at the time of the CP.

Your cache age seems rather low as well as the cache hit, pointig to a really random access;

Net out goes rather well with disk read, which is slighty higher, but regard it as normal. Are there any snapmirrors/snapvaults pointing to that system?

i think i do have some sort of performance problem, any ideas how to check this out?

BTW: DO you HAVE a performance problem? People complaining about poor response times or so?

Mark

Re: Disk Utilization "Problem" / Performance Problem

users / helpdesk report slow file transfers. sometime we do not get above 4-8 mb/sec over cifs (network link is not saturated).

windows roaming profiles get broken on writing back on logout. we did not have that issue, when we had them on local disks on a 4 year old sun fire v20z with normal scsi disks.

latest nfs write test (mounted with wsize 16384) is at about 33-36 mb/sec (with different file sizes. from 400mb to 3 gb).

is such a write speed normal with so many disks in 2 raid groups?

i am testing around alot, and just trying to get some idea what the problem can be (if there is a problem). i do not have that much experience with netapp filers (we had a fc/block based storage before).

-andy

Re: Disk Utilization "Problem" / Performance Problem

What I see are 3 disks not behaving like the others:

0c.25, 1b.28 and 1b.26

they show up with >75% utilization, but do not more xfers like the other disks with 35%; they have a bigger roundtrip-time to get the 1.n 4k blocks, >60'000 usec compared to the 15'000 usecs of the other disks....

maybe these disks are slowing down overall performance.

But because the given data is only a very short snapshot of the whole, you should better investigate with your NetApp-partner or -Support to get a more reliable over-all picture.

Regarding your write-tests: it is normal that one single client cannot fill up the storage system - in such a scenario you often do not measure the "write-performance" of the storage, but the latency of your whole system and network...sorry.

If you really want to know what the storage is capable of then fire from multiple clients simultaneously and multithreaded to your storage - and record by means of perfstat what the storage is doing in that time as well as monitor the resources of the clients you use.

Another means of gathering data is using stats, eg.

>stats show -n 5 -i 2 cifs:cifs_latency

But as I said: I recommend contacting someone who can guide and help you on-site.

regards

Mark

Re: Disk Utilization "Problem" / Performance Problem

thanks for you efforts and help! ill gonna contact netapp support.

-andy

Re: Disk Utilization "Problem" / Performance Problem

Did you have some details about your issue? Where was a problem there? It was a hardware problem?