Re: FAS2554 CPU Limit with large sequential writes?

netrz · ‎2017-07-04

Hi,

Hardware:

FAS2554@cdot 8.3.2 with 24*4TB-SATA-Disks, 2*10G Network connectivity

This FAS exports a NFS-Volume to some ESXi-Hosts. This FAS/volume is dedicated to receiving large sequential writes, f.e. I use GhettoVCB to backup some VMs towards this FAS. The VMs reside on SSDs directly on those ESXi Hosts, which are also connected with 10G. There is no other load on this FAS and MTU size is standard.

During such an backup, write speed on this FAS is about 160-200 MByte/s. Disk utilization (sysstat -x) is about 30%. I wondered about a possible bottleneck and did "node run -node (nodename) sysstat -M 1":

ANY1+ ANY2+ ANY3+ ANY4+  AVG CPU0 CPU1 CPU2 CPU3 Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host  Ops/s   CP
 100%   76%   42%   18%  59%  46%  59%  47%  86%       1%     0%       35%       0%      2%   7%      0%     0%     1%    159%( 98%)         18%        0%   0%     8%      0%   2%   4%   3229  41%
 100%   76%   41%   14%  58%  46%  56%  41%  89%       1%     0%       36%       0%      4%  10%      0%     0%     1%    154%( 98%)          8%        0%   0%    12%      0%   2%   2%   3190  80%
 100%   79%   49%   26%  64%  51%  61%  53%  90%       1%     0%       34%       0%      5%  14%      0%     0%     1%    151%( 99%)         27%        0%   0%    11%      0%   2%   8%   3002  66%
 100%   69%   33%    9%  53%  40%  51%  34%  87%       1%     0%       34%       0%      2%   4%      0%     0%     1%    156%( 98%)          0%        0%   0%     7%      0%   2%   3%   3173  69%
 100%   87%   61%   31%  70%  63%  69%  60%  87%       1%     0%       35%       0%      6%  17%      0%     0%     1%    162%( 98%)         27%        0%   0%    19%      0%   2%   9%   2918 100%

Is the FAS CPU a bottleneck here?

About 3000IOPS@200MB/s means 64K per IO. Is that common/normal?

sgrant · ‎2017-07-05

Hi, I'm happy to be overruled by any performance experts out there, and while it's based upon a very small sample I do not believe your system is CPU bound. Looking at the sample you provided none of the Domains are maxed out nor is any single CPU. However, I am seeing that during the sample CP time was high. Following on from this the and looking at your FAS spec, the FAS2554 only has a small NVRAM backed by relatively slow SATA disks. It could be that the disks are too slow at writing the data. Obviously without knowing the RAID group layout, capacity of the aggregates, stats for stripes, any inline storage efficiences enabled etc this cannot be confirmed, however a good indication would be if you are seeing any back-to-backs in the systat -x output. Also the output from the privilege advanced command satistics start -preset statit would show any disk/RAID related issues mentioned above.

I also assume the ESX hosts have been eliminated?

Thanks,

Grant.

netrz · ‎2017-07-05

It's 23 4TB-SATA disks in 2 Raidgroups (12+13), 1 spare disk. No Inline Efficiency active. Aggregate is 65% filled, so 21 TB are free. Boot partition is distributed over mutiple disks.

ESX hosts can read above 700 Mbyte/s from their SSDs and there is 10G network between Hosts and Netapp. The Volume is exported via NFS3, MTU is 1500.

sysstat -x output during copying the VM files towards Netapp:

 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
 82%   4697      0      0    4700  310212   1916   29671 213116       0      0     0s    98%  100%  Hn   57%       3      0      0       0      0       0      0
 78%   3522      0      0    4058  232002   1428   37094 319292       0      0     0s    98%   99%  Hn   72%     536      0      0       0      0       0      0
 90%   3745      0      0    3771  247590   1546   42236 349691       0      0     0s    98%   99%  Hn   88%      26      0      0       0      0       0      0
 84%   4691      0      0    4790  308975   1913   28553 246294       0      0     0s    98%   99%  Hn   63%      99      0      0       0      0       0      0
 85%   3326      0      0    3326  218253   1353   41409 376313       0      0     0s    98%  100%  :f   84%       0      0      0       0      0       0      0
 90%   3673      0      0    3682  242393   1493   42250 382086       0      0     0s    98%   99%  Hs   81%       9      0      0       0      0       0      0
 85%   4128      0      0    4129  272157   1682   39692 293683       0      0     0s    98%   99%  Hf   68%       1      0      0       0      0       0      0
 86%   4100      0      0    4100  270274   1665   34145 304595       0      0     0s    98%  100%  Hf   72%       0      0      0       0      0       0      0
 92%   3205      0      0    3222  211621   1311   35285 327277       0      0     0s    98%  100%  Hf   87%      17      0      0       0      0       0      0
 93%   3745      0      0    3847  246585   1520   21484 306264       0      0     1s    98%   88%  Hs   65%     102      0      0       0      0       0      0
 87%   4339      0      0    4715  286233   1764   26989 242495       0      0     0s    98%   98%  Hn   68%     376      0      0       0      0       0      0
 76%   4065      0      0    4065  267846   1662   26738 271291       0      0    10     98%   98%  Hn   58%       0      0      0       0      0       0      0
 69%   3512      0      0    3648  231198   1422   23073 269955       0      0     0s    98%  100%  :v   63%     136      0      0       0      0       0      0
 92%   3630      0      0    3630  239451   1477   60929 382315       0      0     0s    98%   99%  Hf   87%       0      0      0       0      0       0      0
 80%   3298      0      0    3298  217327   1346   36297 354928       0      0     1s    98%  100%  Hf   76%       0      0      0       0      0       0      0

before the copy process (for comparison):

 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
  2%      0      0      0       4       1      1       3      0       0      0     0s   100%    0%  -     1%       4      0      0       0      0       0      0
  1%      0      0      0       1       2      1       0      0       0      0     0s   100%    0%  -     0%       1      0      0       0      0       0      0
  1%      0      0      0      17       1      0       5      0       0      0     0s    97%    0%  -     1%      17      0      0       0      0       0      0
  2%      0      0      0     238       2      2       0      0       0      0     0s   100%    0%  -     0%     238      0      0       0      0       0      0
  1%      1      0      0       1       1      1       0      0       0      0     0s   100%    0%  -     0%       0      0      0       0      0       0      0
  1%      1      0      0      13       1      0       0      0       0      0     0s   100%    0%  -     0%      12      0      0       0      0       0      0

I read into statit and did "statit -b" and "statit -e" in priv level:

Hostname: FAS5n  ID: 0536919087  Memory: 14850 MB
  NetApp Release 8.3.2P9: Thu Jan  5 17:58:42 PST 2017
    <1O>
  Start time: Wed Jul  5 21:19:27 CEST 2017

                       CPU Statistics
     120.295600 time (seconds)       100 %
     339.444908 system time          282 %
       2.595744 rupt time              2 %   (1480014 rupts x 2 usec/rupt)
     336.849164 non-rupt system time 280 %
     141.737488 idle time            118 %

     100.658682 time in CP            84 %   100 %
       2.294538 rupt time in CP                2 %   (1277002 rupts x 2 usec/rupt)

                       Multiprocessor Statistics (per second)
                          cpu0       cpu1       cpu2       cpu3      total
sk switches            5312.17    4320.47    3976.76    1719.25   15328.66
hard switches          5199.93    4220.20    3876.03    1644.39   14940.55
domain switches         363.05     317.38     248.90     414.04    1343.36
CP rupts               1795.90    1691.52    1677.66    5450.45   10615.53
nonCP rupts             333.40     330.54     327.63     696.05    1687.61
IPI rupts                 0.00       0.00       0.00       0.00       0.00
grab kahuna               0.00       0.00       0.00       0.00       0.00

grab kahuna usec          0.00       0.00       0.00       0.00       0.00
suspend domain            0.00       0.00       0.00       0.00       0.00
suspend domain usec       0.00       0.00       0.00       0.00       0.00

CP rupt usec           6822.94    1663.73    1702.98    8884.50   19074.16
nonCP rupt usec        1040.44     213.17     214.87    1035.38    2503.87
idle                 339617.73  370765.58  298149.14  169710.88 1178243.34
kahuna                  108.15      72.31      85.21   20105.60   20371.28
storage               31282.17   30141.51   20982.70     365.31   82771.71
exempt               115366.31  100331.48   65762.56     871.84  282332.21
raid                    954.41     836.59  296524.12    1067.54  299382.67
raid_exempt               0.00       0.00       0.00       0.00       0.00
target                   18.44       7.71       8.99       0.52      35.66
dnscache                  0.00       0.00       0.00       0.00       0.00
cifs                      0.00       0.00       0.00       0.00       0.00
wafl_exempt          190206.85  183914.00  113551.43  793781.63 1281453.94
wafl_xcleaner        148191.69  151112.24   42036.11    1403.36  342743.42
sm_exempt                61.82      59.14      30.10      11.21     162.29
protocol                419.97     403.13     273.89      10.31    1107.31
nwk_exclusive          3661.36    2959.70    2417.40      45.26    9083.72
nwk_exempt           116977.89  106979.75  134049.95    2592.75  360600.35
nwk_legacy             1824.59    1658.20    1197.09      22.98    4702.87
hostOS                43328.67   48793.26   22955.49      89.07  115166.50
ssan_exempt             116.51      88.42      57.92       1.80     264.66

                       FreeBSD CPU state  Statistics (per second)

user                      3.66       4.81       1.85       0.00      10.32
nice                      0.00       0.00       0.00       0.00       0.00
sys                      75.15      75.90      82.41     110.38     343.84
intr                      0.11       0.22       0.15       0.02       0.50
idle                     54.02      51.98      48.52      22.54     177.06

nonrt-pf-cnt              0.75       0.82       0.00       0.00       1.57
nonrt-pf-usec             3.53       3.83       0.00       0.00       7.37
rt-pf-cnt                 0.00       0.00       0.00       0.00       0.00
rt-pf-usec                0.00       0.00       0.00       0.00       0.00
kern-pf-cnt               0.00       0.01       0.00       0.00       0.01
kern-pf-usec              0.00       0.08       0.00       0.00       0.08

     110.487602 seconds with one or more CPUs active   ( 92%)
      97.490600 seconds with 2 or more CPUs active     ( 81%)
      75.987002 seconds with 3 or more CPUs active     ( 63%)

      12.997001 seconds with one CPU active            ( 11%)
      21.503598 seconds with 2 CPUs active             ( 18%)
      21.108768 seconds with 3 CPUs active             ( 18%)
      54.878234 seconds with all CPUs active           ( 46%)

                       Domain Utilization of Shared Domains (per second)
      0.00 idle                         869642.23 kahuna
      0.00 storage                           0.00 exempt
      0.00 raid                              0.00 raid_exempt
      0.00 target                            0.00 dnscache
      0.00 cifs                              0.00 wafl_exempt
      0.00 wafl_xcleaner                     0.00 sm_exempt
      0.00 protocol                     362562.02 nwk_exclusive
      0.00 nwk_exempt                        0.00 nwk_legacy
      0.00 hostOS                            0.00 ssan_exempt


                       switch domain to domain (per second)
      0.00 idle                            362.96 kahuna
    112.68 storage                           5.78 exempt
      7.00 raid                              0.00 raid_exempt
      0.61 target                            0.00 dnscache
      0.00 cifs                             50.43 wafl_exempt
      0.00 wafl_xcleaner                     0.00 sm_exempt
      1.43 protocol                          1.29 nwk_exclusive
    300.22 nwk_exempt                      500.97 nwk_legacy
      0.00 hostOS                            0.00 ssan_exempt


                       Exempt Domain Suspension Stats (per second)


                       Miscellaneous Statistics (per second)
  14940.55 hard context switches          3269.65 NFS operations
      0.00 CIFS operations                   0.00 HTTP operations
 418893.89 network KB received            2574.56 network KB transmitted
  29458.27 disk KB read                 249984.14 disk KB written
 210030.95 NVRAM KB written                  0.00 nolog KB written
      0.00 WAFL bufs given to clients        0.00 checksum cache hits  (   0%)
      0.00 no checksum - partial buffer      0.00 FCP operations
      0.00 iSCSI operations

                       WAFL Statistics (per second)
     30.00 name cache hits      (  73%)     10.92 name cache misses    (  27%)
 714038.55 buf hash hits        (  86%) 119833.86 buf hash misses      (  14%)
   3656.25 inode cache hits     ( 100%)      0.10 inode cache misses   (   0%)
  12553.47 buf cache hits       (  98%)    302.50 buf cache misses     (   2%)
     23.22 blocks read                     275.12 blocks read-ahead
     78.04 chains read-ahead                 0.18 dummy reads
     22.94 blocks speculative read-ahead  50983.63 blocks written
     91.23 stripes written                 484.93 blocks page flipped
    572.65 blocks over-written               0.00 wafl_timer generated CP
      0.00 snapshot generated CP             0.00 wafl_avail_bufs generated CP
      0.62 dirty_blk_cnt generated CP        0.00 full NV-log generated CP
      0.00 back-to-back CP                   0.00 flush generated CP
      0.00 sync generated CP                 0.00 deferred back-to-back CP
      0.00 low mbufs generated CP            0.00 low datavecs generated CP
      0.00 nvlog replay takeover time limit CP   4971.55 non-restart messages
      0.16 IOWAIT suspends             1678061804.31 next nvlog nearly full msecs
      0.00 dirty buffer susp msecs           0.00 nvlog full susp msecs
      0.00 nvlh susp msecs                2728520 buffers

                       RAID Statistics (per second)
   7401.72 xors                              0.00 long dispatches [0]
      0.00 long consumed [0]                 0.00 long consumed hipri [0]
      0.00 long low priority [0]             0.00 long high priority [0]
     99.86 long monitor tics [0]             0.02 long monitor clears [0]
      0.00 long dispatches [1]               0.00 long consumed [1]
      0.00 long consumed hipri [1]           0.00 long low priority [1]
     99.86 long high priority [1]           99.86 long monitor tics [1]
      0.02 long monitor clears [1]             18 max batch
     31.29 blocked mode xor                287.84 timed mode xor
      6.75 fast adjustments                  7.64 slow adjustments
         0 avg batch start                      0 avg stripe/msec
      0.00 checksum dispatches               0.00 checksum consumed
     93.44 tetrises written                  0.00 master tetrises
      0.00 slave tetrises                 5751.70 stripes written
   1650.02 partial stripes                4101.68 full stripes
  50985.84 blocks written                 6344.04 blocks read
      6.33 1 blocks per stripe size 8        3.52 2 blocks per stripe size 8
      3.25 3 blocks per stripe size 8        2.14 4 blocks per stripe size 8
      1.61 5 blocks per stripe size 8        0.94 6 blocks per stripe size 8
      0.52 7 blocks per stripe size 8        0.18 8 blocks per stripe size 8
     15.05 1 blocks per stripe size 9       15.24 2 blocks per stripe size 9
     17.30 3 blocks per stripe size 9       32.57 4 blocks per stripe size 9
     55.75 5 blocks per stripe size 9       97.13 6 blocks per stripe size 9
    173.83 7 blocks per stripe size 9      414.49 8 blocks per stripe size 9
   2022.78 9 blocks per stripe size 9       15.00 1 blocks per stripe size 10
     12.89 2 blocks per stripe size 10      12.84 3 blocks per stripe size 10
     16.36 4 blocks per stripe size 10      30.53 5 blocks per stripe size 10
     47.87 6 blocks per stripe size 10      84.23 7 blocks per stripe size 10
    168.61 8 blocks per stripe size 10     422.01 9 blocks per stripe size 10
   2078.72 10 blocks per stripe size 10

                       Network Interface Statistics (per second)
iface    side      bytes    packets multicasts     errors collisions  pkt drops
e0a      recv       0.00       0.00       0.00       0.00                  0.00
         xmit       0.00       0.00       0.00       0.00       0.00
e0b      recv       0.00       0.00       0.00       0.00                  0.00
         xmit       0.00       0.00       0.00       0.00       0.00
e0e      recv 215207514.91   11697.72       0.00       0.00                  0.00
         xmit     386.18       1.99       0.05       0.00       0.00
e0f      recv     457.46       2.56       0.00       0.00                  0.00
         xmit 1318494.11   11092.90       0.06       0.00       0.00
e0M      recv    1706.76      11.37       1.63       0.00                  0.00
         xmit    7157.08       9.93       0.02       0.00       0.00
e0P      recv       7.07       0.08       0.00       0.00                  0.00
         xmit       8.31       0.10       0.02       0.00       0.00
a0a      recv 215207972.37   11700.28       0.10       0.00                  0.00
         xmit 1318880.29   11094.89       0.11       0.00       0.00
a0a-1201 recv       0.00       0.00       0.00       0.00                  0.00
         xmit       0.00       0.00       0.00       0.00       0.00
a0a-2486 recv 213737593.48   11621.55       0.00       0.00                  0.00
         xmit 1310305.39   11022.07       0.00       0.00       0.00
a0a-3608 recv      66.33       1.08       1.08       0.00                  0.00
         xmit       0.00       0.00       0.00       0.00       0.00

                       Disk Statistics (per second)
        ut% is the percent of time the disk was busy.
        xfers is the number of data-transfer commands issued per second.
        xfers = ureads + writes + cpreads + greads + gwrites
        chain is the average number of 4K blocks per command.
        usecs is the average disk round-trip time per 4K block.

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/sata_wh/plex0/rg0:
0a.10.13          42 115.87    0.00   ....     .  47.09  61.39   183  68.79  12.76   387   0.00   ....     .   0.00   ....     .
0a.10.15          49 117.58    0.00   ....     .  48.85  59.25   246  68.74  12.76   527   0.00   ....     .   0.00   ....     .
0a.10.2           59 104.88   15.03   1.02 39509  65.23  41.53   385  24.63   7.53  1887   0.00   ....     .   0.00   ....     .
0a.10.3           50  97.65   16.48   1.02 37285  62.54  43.32   368  18.63  10.53  1049   0.00   ....     .   0.00   ....     .
0a.10.4           50  96.12   15.57   1.02 34450  62.29  43.83   360  18.27   9.13  1229   0.00   ....     .   0.00   ....     .
0a.10.5           49  95.43   16.14   1.02 36738  61.30  44.44   350  17.99   9.48  1125   0.00   ....     .   0.00   ....     .
0a.10.6           50  97.65   16.02   1.02 36134  62.41  43.57   359  19.22   9.24  1177   0.00   ....     .   0.00   ....     .
0a.10.7           49  95.28   15.21   1.02 36052  61.87  44.00   358  18.20   9.55  1085   0.00   ....     .   0.00   ....     .
0a.10.8           49  97.18   15.88   1.02 35157  62.44  43.51   363  18.86   9.69  1100   0.00   ....     .   0.00   ....     .
0a.10.9           50  95.67   16.17   1.02 36281  61.38  44.28   352  18.12   9.97  1124   0.00   ....     .   0.00   ....     .
0a.10.1           53 100.78   16.73   1.02 37001  64.45  42.40   376  19.60   9.46  1197   0.00   ....     .   0.00   ....     .
0a.10.0           52  99.00   16.14   1.02 38089  63.48  43.05   365  19.38   9.50  1232   0.00   ....     .   0.00   ....     .
/sata_wh/plex0/rg1:
0a.10.10          40 108.60    0.00   ....     .  45.42  62.65   186  63.18  14.06   363   0.00   ....     .   0.00   ....     .
0a.10.12          44 109.27    0.00   ....     .  46.11  61.74   232  63.16  14.05   467   0.00   ....     .   0.00   ....     .
0a.10.14          50  97.84   15.16   1.00 33412  63.47  41.64   432  19.21  10.57  1077   0.00   ....     .   0.00   ....     .
0a.10.16          50  96.24   14.95   1.01 33250  62.32  42.38   411  18.96  10.53  1072   0.00   ....     .   0.00   ....     .
0a.10.17          50  96.26   15.25   1.01 34504  62.32  42.50   410  18.68  10.17  1101   0.00   ....     .   0.00   ....     .
0a.10.18          50  95.81   16.37   1.00 34363  61.09  43.29   418  18.35  10.21  1167   0.00   ....     .   0.00   ....     .
0a.10.19          50  94.00   14.66   1.00 32563  61.32  43.23   411  18.02  10.28  1068   0.00   ....     .   0.00   ....     .
0a.10.20          50  94.74   15.87   1.00 33920  60.97  43.50   407  17.90  10.90  1052   0.00   ....     .   0.00   ....     .
0a.10.21          50  94.08   14.95   1.00 32229  60.83  43.63   414  18.29  10.18  1152   0.00   ....     .   0.00   ....     .
0a.10.22          50  95.56   15.55   1.00 34687  61.55  43.07   418  18.46  10.57  1128   0.00   ....     .   0.00   ....     .
0a.10.23          50  94.56   15.67   1.00 32724  60.64  43.77   410  18.25  10.51  1145   0.00   ....     .   0.00   ....     .
/aggr0_fas5n/plex0/rg0:
0a.10.0           52  99.00   16.14   1.02 38089  63.48  43.05   365  19.38   9.50  1232   0.00   ....     .   0.00   ....     .
0a.10.1           53 100.78   16.73   1.02 37001  64.45  42.40   376  19.60   9.46  1197   0.00   ....     .   0.00   ....     .
0a.10.2           59 104.88   15.03   1.02 39509  65.23  41.53   385  24.63   7.53  1887   0.00   ....     .   0.00   ....     .
0a.10.3           50  97.65   16.48   1.02 37285  62.54  43.32   368  18.63  10.53  1049   0.00   ....     .   0.00   ....     .
0a.10.4           50  96.12   15.57   1.02 34450  62.29  43.83   360  18.27   9.13  1229   0.00   ....     .   0.00   ....     .
0a.10.5           49  95.43   16.14   1.02 36738  61.30  44.44   350  17.99   9.48  1125   0.00   ....     .   0.00   ....     .
0a.10.6           50  97.65   16.02   1.02 36134  62.41  43.57   359  19.22   9.24  1177   0.00   ....     .   0.00   ....     .
0a.10.7           49  95.28   15.21   1.02 36052  61.87  44.00   358  18.20   9.55  1085   0.00   ....     .   0.00   ....     .
0a.10.8           49  97.18   15.88   1.02 35157  62.44  43.51   363  18.86   9.69  1100   0.00   ....     .   0.00   ....     .
0a.10.9           50  95.67   16.17   1.02 36281  61.38  44.28   352  18.12   9.97  1124   0.00   ....     .   0.00   ....     .

Aggregate statistics:
Minimum           40  94.00    0.00               45.42               17.90                0.00                0.00
Mean              50  99.08   13.85               60.50               24.72                0.00                0.00
Maximum           59 117.58   16.73               65.23               68.79                0.00                0.00

Spares and other disks:
0a.10.11           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

                       FCP Statistics (per second)
      0.00 FCP Bytes recv                    0.00 FCP Bytes sent
      0.00 FCP ops

                       iSCSI Statistics (per second)
      0.00 iSCSI Bytes recv                  0.00 iSCSI Bytes xmit
      0.00 iSCSI ops

                       Tape Statistics (per second)

tape                             write bytes blocks    read bytes blocks
BC6505-1:3.126                          0.00    0.00         0.00    0.00
BC6505-1:7.126                          0.00    0.00         0.00    0.00
BC6505-1:4.126                          0.00    0.00         0.00    0.00
BC6505-1:8.126                          0.00    0.00         0.00    0.00

                       Interrupt Statistics (per second)
   1585.72 MSI[256](3:49) (1,0,0) PMC SAS/SATA Controller 8001
      0.27 MSIX[258](2:50) (3,0,0) QLogic EP 8324 PCI-Express FC controller
      0.23 MSIX[260](0:50) (3,0,1) QLogic EP 8324 PCI-Express FC controller
      1.06 MSIX[261](1:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller
      3.06 MSIX[262](2:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller
   2560.91 MSIX[263](3:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller
      1.51 MSIX[264](0:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller
      1.67 MSIX[265](1:52) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller
      1.06 MSIX[266](2:52) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller
      0.44 MSIX[267](3:52) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller
    128.64 MSIX[268](0:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller
      1.31 MSIX[269](1:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller
      1.81 MSIX[270](2:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller
     14.72 MSI[277](1:55) (5,0,2) 82580 Quad Copper Gigabit
      0.18 MSI[278](2:55) (5,0,3) 82580 Quad Copper Gigabit
      0.00 RTC                               0.00 IPI
    999.37 Msec Clock                     5301.96 total

                       NVRAM Statistics (per second)
      0.00 total dma transfer KB        210029.27 wafl write req data KB
      0.00 dma transactions                  0.00 dma destriptors
      0.00 waitdone preempts                 0.00 waitdone delays
     84.87 transactions not queued        3295.91 transactions queued
   3302.55 transactions done           5929480.25 total ldma waittime (MS)
5205488476.82 total rdma waittime (MS)         29.39 completion wakeups
 210030.95 total nvlog KB                    0.00 total nolog KB
      0.00 empty entry descriptor pool       0.00 channel1 dma transfer KB
      0.00 channel1 dma transactions         0.00 channel1 dma descriptors

                       FlexLog Statistics

Initiate transfer latency
xfer size < =            count    lcl avg    lcl max    rmt avg    rmt max
16:                  44534094       1.65     104.03       0.15      16.17
256:                 14452544       6.93     224.71       0.03      10.34
4096:                 6592771       1.75      81.59       0.00       0.00
65536:                   1881       2.93      15.72       0.12       0.12
1048576:                    0       0.00       0.00       0.00       0.00
---------------------------
ldma isdone:              2.64 us  Max:      94.01 us
ldma waitdone:            6.91 us  Max:     177.92 us
rdma isdone:              0.00 us  Max:       0.00 us
rdma waitdone:            0.00 us  Max:       0.00 us
BE Resources Depleted Cnt:          0 No Bucket Ready Cnt:          0

Thats a ton of information and I cant find the bottleneck.

AlexDawson · ‎2017-07-05

sysstat -x's output shows the bottleneck quite clearly under CP (Consistency Point) Time - it is spending almost 100% of its time flushing data to disk. CP Type shows why it is flushing to disk.

This document explains Consistency Points and the impact on performance - https://kb.netapp.com/support/s/article/ka21A0000000jpHQAQ/faq-consistency-point?language=en_US

The FAS2554 has 1280MB of NVRAM - but it is mirroring the other controller, so it has 640MB available. Once it hits a high watermark for that 640MB, it flushes to disk. While it is flushing to disk, any incoming writes are held.

Sorry to say, but I don't think you're likely to hit much higher than 200MByte/sec. Our internal modeling tools suggest you might get a little bit higher, but not too much.

netrz · ‎2017-07-06

Hi,

if the speed is limited by the disk flushing process here, why is disk utilization only at 40 to 50%? Is there another limiting factor between NVRAM and disks?

Would write speed increase if we increase aggregate's disk number by adding another shelf (f.e. a shelf with 20 4TB disks and 4 SSDs)? (Controller is a single one, it's not HA since this system is only for secondary backup purposes.)

AlexDawson · ‎2017-07-06

Hmm ok, as a non HA system you get more NVRAM.

But you still have 70-80%+ disk utilisation in the sysstat -x output.

Flashpool helps more with read that write (it only works for overwrites) - so no point in a hybrid shelf.

With the system checkpointing at high watermark almost every second, I don't believe adding disks will help significantly. I would suggest ensuring your system is running 8.3.2P11 and check after the ontap upgrade.

netrz · ‎2017-07-06

Hmm,

I patched the filer to 8.3.2P11. I will repeat the measurement but I cant do it right now because after reboot the filer startet to update all his snapmirror relations. (The filer is also Snapmirror Destination for some volumes on other filers. These transfers run usually at night in different time windows that the ESXi transfers.)

I would expect with more disks the filer can flush NVRAM faster to disk and clear a CP faster so performance would rise.

AlexDawson · ‎2017-07-06

The system won't CP more often than once per second, and it's running a CP every second because the high watermark for the NVRAM is full, and the CPs are mostly (from the small sample) finishing within the second. If the CPs were taking longer than a second, yes, more disks might help.

I hear you about more disks enabling the system to return from CP quicker - but we then go back to CP filling and needing to flush again on the same schedule.

I will disclaim that others may see something else in the stats you have posted, but this is as far as my experience would dig.

sgrant · ‎2017-07-06

Again I will bow down to the performance experts, however a couple of comments from the stats you've provided...

I may be reading this wrong, however from the sysstat output there appears to be a larger number of disk writes than net in, suggesting a possible misalignement on the VMware. Are you able to check this: https://kb.netapp.com/support/s/article/ka31A0000000x5ZQAQ/how-to-identify-misalignment-over-nfs-on-the-storage-system?language=en_US

Also, from the statit output the xfers (IOPS) of the disk are up at around 100 per disks. This is generally the most you can get out of a SATA disk, so they look to be running flat out, despite what the utl% is stating. This being the case, more disks may actually help however, since most of the new writes will go to the new disks, this may initially adversley affect performance, see next paragraph...

The data is only being written about 60% of the time to full stripes, suggesting the aggregates are filling, or filled previously and had alot of old data deleted. Maybe looking at a reallocate will help, both to increase the number of contiguous free blocks and if a new RG or 2 are added then even the used space between the disks and therefore allowing all spindles to be written to. See https://library.netapp.com/ecmdocs/ECMLP2348025/html/reallocate/measure.html

However, I would agree with Alex and your underlying issue is the small NVRAM in the system. If an HW upgrade is not possible, then by removing the HA partner (and therefore removing the nodes from the cluster) you would be able to utilise the entire NVRAM.

Thanks,

Grant.

netrz · ‎2017-07-06

Hi,

thanks to both of you for your Input. Its a single system so it doesn't have to share its 2GB NVRAM.

I did a "reallocate measure (path to my volume) -once". The result in event log is "Allocation measurement check on '/vol/backup2nfs' is 2.". As far as I know this is a good value.

But free space reallocation at aggregate level is set to "off". The Aggregate was filled up to about 90% some weeks ago (mainly due to large non filled thick provisioned volumes), but is now down to 65%.

About misalignment:

VMs are not active in this NFS volume. Their VMDKs are only copied to the volume for backup. I always thought, misalignment is a problem if I write inside a VM which lies misaligned in a volume. "nfsstat -d" seems not to exist in cdot, even at debug command level. It seems to be a 7-mode command.

sgrant · ‎2017-07-07

Hi, you are correct a reallocate measure of 2 suggests a very healthy layout of the used blocks. However, it does appear your free space is not as contiguous as it could be (typical of a previous full aggregate), maybe scheduling a reallocate would assit here, however I do beleive this would only have a slight impact. Please see https://kb.netapp.com/support/s/article/ka31A0000000xBcQAI/how-to-set-a-reallocate-schedule-in-clustered-data-ontap?language=en_US for details.

Regarding the misalignment, OK, sorry I misunderstood the setup. Being just an NFS volume, as you state it will not be an issue.

I'm now not sure I can assit further. I'm not sure the free space reallocate will give you that much as to make a difference, the controller is already in a single node config so maximising the NVRAM. I do believe we're at the point where you are limited by the hardware!

Good luck,

Grant.

netrz · ‎2017-07-07

Yes, it seems so. I will run reallocation scans over weekend just to be sure.

Do you have an idea what is the reason for the read requests during the writing of the files? There are reads for about 10-15% of all written kilobytes in the sysstat quote which I posted on page 1.

sgrant · ‎2017-07-07

Looking at the statit output, most of the reads are of type cpreads, meaning it's performing parity calculations, hence the majority of the IO against the 2 parity disks in each RG. When it cant write a full stripe it needs to read the parity info and perform some CP calculations before it can write the data.

Otherwise, likely to be background WAFL scanners (privilege command: system node run -node <node> wafl scan status).

Cheers,

Grant.

FAS2554 CPU Limit with large sequential writes?

Join us in Vegas, September 23-25