ONTAP Hardware
ONTAP Hardware
Hi,
Hardware:
FAS2554@cdot 8.3.2 with 24*4TB-SATA-Disks, 2*10G Network connectivity
This FAS exports a NFS-Volume to some ESXi-Hosts. This FAS/volume is dedicated to receiving large sequential writes, f.e. I use GhettoVCB to backup some VMs towards this FAS. The VMs reside on SSDs directly on those ESXi Hosts, which are also connected with 10G. There is no other load on this FAS and MTU size is standard.
During such an backup, write speed on this FAS is about 160-200 MByte/s. Disk utilization (sysstat -x) is about 30%. I wondered about a possible bottleneck and did "node run -node (nodename) sysstat -M 1":
ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Nwk_Excl Nwk_Lg Nwk_Exmpt Protocol Storage Raid Raid_Ex Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt SSAN_Ex Intr Host Ops/s CP 100% 76% 42% 18% 59% 46% 59% 47% 86% 1% 0% 35% 0% 2% 7% 0% 0% 1% 159%( 98%) 18% 0% 0% 8% 0% 2% 4% 3229 41% 100% 76% 41% 14% 58% 46% 56% 41% 89% 1% 0% 36% 0% 4% 10% 0% 0% 1% 154%( 98%) 8% 0% 0% 12% 0% 2% 2% 3190 80% 100% 79% 49% 26% 64% 51% 61% 53% 90% 1% 0% 34% 0% 5% 14% 0% 0% 1% 151%( 99%) 27% 0% 0% 11% 0% 2% 8% 3002 66% 100% 69% 33% 9% 53% 40% 51% 34% 87% 1% 0% 34% 0% 2% 4% 0% 0% 1% 156%( 98%) 0% 0% 0% 7% 0% 2% 3% 3173 69% 100% 87% 61% 31% 70% 63% 69% 60% 87% 1% 0% 35% 0% 6% 17% 0% 0% 1% 162%( 98%) 27% 0% 0% 19% 0% 2% 9% 2918 100%
Is the FAS CPU a bottleneck here?
About 3000IOPS@200MB/s means 64K per IO. Is that common/normal?
Hi, I'm happy to be overruled by any performance experts out there, and while it's based upon a very small sample I do not believe your system is CPU bound. Looking at the sample you provided none of the Domains are maxed out nor is any single CPU. However, I am seeing that during the sample CP time was high. Following on from this the and looking at your FAS spec, the FAS2554 only has a small NVRAM backed by relatively slow SATA disks. It could be that the disks are too slow at writing the data. Obviously without knowing the RAID group layout, capacity of the aggregates, stats for stripes, any inline storage efficiences enabled etc this cannot be confirmed, however a good indication would be if you are seeing any back-to-backs in the systat -x output. Also the output from the privilege advanced command satistics start -preset statit would show any disk/RAID related issues mentioned above.
I also assume the ESX hosts have been eliminated?
Thanks,
Grant.
It's 23 4TB-SATA disks in 2 Raidgroups (12+13), 1 spare disk. No Inline Efficiency active. Aggregate is 65% filled, so 21 TB are free. Boot partition is distributed over mutiple disks.
ESX hosts can read above 700 Mbyte/s from their SSDs and there is 10G network between Hosts and Netapp. The Volume is exported via NFS3, MTU is 1500.
sysstat -x output during copying the VM files towards Netapp:
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 82% 4697 0 0 4700 310212 1916 29671 213116 0 0 0s 98% 100% Hn 57% 3 0 0 0 0 0 0 78% 3522 0 0 4058 232002 1428 37094 319292 0 0 0s 98% 99% Hn 72% 536 0 0 0 0 0 0 90% 3745 0 0 3771 247590 1546 42236 349691 0 0 0s 98% 99% Hn 88% 26 0 0 0 0 0 0 84% 4691 0 0 4790 308975 1913 28553 246294 0 0 0s 98% 99% Hn 63% 99 0 0 0 0 0 0 85% 3326 0 0 3326 218253 1353 41409 376313 0 0 0s 98% 100% :f 84% 0 0 0 0 0 0 0 90% 3673 0 0 3682 242393 1493 42250 382086 0 0 0s 98% 99% Hs 81% 9 0 0 0 0 0 0 85% 4128 0 0 4129 272157 1682 39692 293683 0 0 0s 98% 99% Hf 68% 1 0 0 0 0 0 0 86% 4100 0 0 4100 270274 1665 34145 304595 0 0 0s 98% 100% Hf 72% 0 0 0 0 0 0 0 92% 3205 0 0 3222 211621 1311 35285 327277 0 0 0s 98% 100% Hf 87% 17 0 0 0 0 0 0 93% 3745 0 0 3847 246585 1520 21484 306264 0 0 1s 98% 88% Hs 65% 102 0 0 0 0 0 0 87% 4339 0 0 4715 286233 1764 26989 242495 0 0 0s 98% 98% Hn 68% 376 0 0 0 0 0 0 76% 4065 0 0 4065 267846 1662 26738 271291 0 0 10 98% 98% Hn 58% 0 0 0 0 0 0 0 69% 3512 0 0 3648 231198 1422 23073 269955 0 0 0s 98% 100% :v 63% 136 0 0 0 0 0 0 92% 3630 0 0 3630 239451 1477 60929 382315 0 0 0s 98% 99% Hf 87% 0 0 0 0 0 0 0 80% 3298 0 0 3298 217327 1346 36297 354928 0 0 1s 98% 100% Hf 76% 0 0 0 0 0 0 0
before the copy process (for comparison):
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 2% 0 0 0 4 1 1 3 0 0 0 0s 100% 0% - 1% 4 0 0 0 0 0 0 1% 0 0 0 1 2 1 0 0 0 0 0s 100% 0% - 0% 1 0 0 0 0 0 0 1% 0 0 0 17 1 0 5 0 0 0 0s 97% 0% - 1% 17 0 0 0 0 0 0 2% 0 0 0 238 2 2 0 0 0 0 0s 100% 0% - 0% 238 0 0 0 0 0 0 1% 1 0 0 1 1 1 0 0 0 0 0s 100% 0% - 0% 0 0 0 0 0 0 0 1% 1 0 0 13 1 0 0 0 0 0 0s 100% 0% - 0% 12 0 0 0 0 0 0
I read into statit and did "statit -b" and "statit -e" in priv level:
Hostname: FAS5n ID: 0536919087 Memory: 14850 MB NetApp Release 8.3.2P9: Thu Jan 5 17:58:42 PST 2017 <1O> Start time: Wed Jul 5 21:19:27 CEST 2017 CPU Statistics 120.295600 time (seconds) 100 % 339.444908 system time 282 % 2.595744 rupt time 2 % (1480014 rupts x 2 usec/rupt) 336.849164 non-rupt system time 280 % 141.737488 idle time 118 % 100.658682 time in CP 84 % 100 % 2.294538 rupt time in CP 2 % (1277002 rupts x 2 usec/rupt) Multiprocessor Statistics (per second) cpu0 cpu1 cpu2 cpu3 total sk switches 5312.17 4320.47 3976.76 1719.25 15328.66 hard switches 5199.93 4220.20 3876.03 1644.39 14940.55 domain switches 363.05 317.38 248.90 414.04 1343.36 CP rupts 1795.90 1691.52 1677.66 5450.45 10615.53 nonCP rupts 333.40 330.54 327.63 696.05 1687.61 IPI rupts 0.00 0.00 0.00 0.00 0.00 grab kahuna 0.00 0.00 0.00 0.00 0.00 grab kahuna usec 0.00 0.00 0.00 0.00 0.00 suspend domain 0.00 0.00 0.00 0.00 0.00 suspend domain usec 0.00 0.00 0.00 0.00 0.00 CP rupt usec 6822.94 1663.73 1702.98 8884.50 19074.16 nonCP rupt usec 1040.44 213.17 214.87 1035.38 2503.87 idle 339617.73 370765.58 298149.14 169710.88 1178243.34 kahuna 108.15 72.31 85.21 20105.60 20371.28 storage 31282.17 30141.51 20982.70 365.31 82771.71 exempt 115366.31 100331.48 65762.56 871.84 282332.21 raid 954.41 836.59 296524.12 1067.54 299382.67 raid_exempt 0.00 0.00 0.00 0.00 0.00 target 18.44 7.71 8.99 0.52 35.66 dnscache 0.00 0.00 0.00 0.00 0.00 cifs 0.00 0.00 0.00 0.00 0.00 wafl_exempt 190206.85 183914.00 113551.43 793781.63 1281453.94 wafl_xcleaner 148191.69 151112.24 42036.11 1403.36 342743.42 sm_exempt 61.82 59.14 30.10 11.21 162.29 protocol 419.97 403.13 273.89 10.31 1107.31 nwk_exclusive 3661.36 2959.70 2417.40 45.26 9083.72 nwk_exempt 116977.89 106979.75 134049.95 2592.75 360600.35 nwk_legacy 1824.59 1658.20 1197.09 22.98 4702.87 hostOS 43328.67 48793.26 22955.49 89.07 115166.50 ssan_exempt 116.51 88.42 57.92 1.80 264.66 FreeBSD CPU state Statistics (per second) user 3.66 4.81 1.85 0.00 10.32 nice 0.00 0.00 0.00 0.00 0.00 sys 75.15 75.90 82.41 110.38 343.84 intr 0.11 0.22 0.15 0.02 0.50 idle 54.02 51.98 48.52 22.54 177.06 nonrt-pf-cnt 0.75 0.82 0.00 0.00 1.57 nonrt-pf-usec 3.53 3.83 0.00 0.00 7.37 rt-pf-cnt 0.00 0.00 0.00 0.00 0.00 rt-pf-usec 0.00 0.00 0.00 0.00 0.00 kern-pf-cnt 0.00 0.01 0.00 0.00 0.01 kern-pf-usec 0.00 0.08 0.00 0.00 0.08 110.487602 seconds with one or more CPUs active ( 92%) 97.490600 seconds with 2 or more CPUs active ( 81%) 75.987002 seconds with 3 or more CPUs active ( 63%) 12.997001 seconds with one CPU active ( 11%) 21.503598 seconds with 2 CPUs active ( 18%) 21.108768 seconds with 3 CPUs active ( 18%) 54.878234 seconds with all CPUs active ( 46%) Domain Utilization of Shared Domains (per second) 0.00 idle 869642.23 kahuna 0.00 storage 0.00 exempt 0.00 raid 0.00 raid_exempt 0.00 target 0.00 dnscache 0.00 cifs 0.00 wafl_exempt 0.00 wafl_xcleaner 0.00 sm_exempt 0.00 protocol 362562.02 nwk_exclusive 0.00 nwk_exempt 0.00 nwk_legacy 0.00 hostOS 0.00 ssan_exempt switch domain to domain (per second) 0.00 idle 362.96 kahuna 112.68 storage 5.78 exempt 7.00 raid 0.00 raid_exempt 0.61 target 0.00 dnscache 0.00 cifs 50.43 wafl_exempt 0.00 wafl_xcleaner 0.00 sm_exempt 1.43 protocol 1.29 nwk_exclusive 300.22 nwk_exempt 500.97 nwk_legacy 0.00 hostOS 0.00 ssan_exempt Exempt Domain Suspension Stats (per second) Miscellaneous Statistics (per second) 14940.55 hard context switches 3269.65 NFS operations 0.00 CIFS operations 0.00 HTTP operations 418893.89 network KB received 2574.56 network KB transmitted 29458.27 disk KB read 249984.14 disk KB written 210030.95 NVRAM KB written 0.00 nolog KB written 0.00 WAFL bufs given to clients 0.00 checksum cache hits ( 0%) 0.00 no checksum - partial buffer 0.00 FCP operations 0.00 iSCSI operations WAFL Statistics (per second) 30.00 name cache hits ( 73%) 10.92 name cache misses ( 27%) 714038.55 buf hash hits ( 86%) 119833.86 buf hash misses ( 14%) 3656.25 inode cache hits ( 100%) 0.10 inode cache misses ( 0%) 12553.47 buf cache hits ( 98%) 302.50 buf cache misses ( 2%) 23.22 blocks read 275.12 blocks read-ahead 78.04 chains read-ahead 0.18 dummy reads 22.94 blocks speculative read-ahead 50983.63 blocks written 91.23 stripes written 484.93 blocks page flipped 572.65 blocks over-written 0.00 wafl_timer generated CP 0.00 snapshot generated CP 0.00 wafl_avail_bufs generated CP 0.62 dirty_blk_cnt generated CP 0.00 full NV-log generated CP 0.00 back-to-back CP 0.00 flush generated CP 0.00 sync generated CP 0.00 deferred back-to-back CP 0.00 low mbufs generated CP 0.00 low datavecs generated CP 0.00 nvlog replay takeover time limit CP 4971.55 non-restart messages 0.16 IOWAIT suspends 1678061804.31 next nvlog nearly full msecs 0.00 dirty buffer susp msecs 0.00 nvlog full susp msecs 0.00 nvlh susp msecs 2728520 buffers RAID Statistics (per second) 7401.72 xors 0.00 long dispatches [0] 0.00 long consumed [0] 0.00 long consumed hipri [0] 0.00 long low priority [0] 0.00 long high priority [0] 99.86 long monitor tics [0] 0.02 long monitor clears [0] 0.00 long dispatches [1] 0.00 long consumed [1] 0.00 long consumed hipri [1] 0.00 long low priority [1] 99.86 long high priority [1] 99.86 long monitor tics [1] 0.02 long monitor clears [1] 18 max batch 31.29 blocked mode xor 287.84 timed mode xor 6.75 fast adjustments 7.64 slow adjustments 0 avg batch start 0 avg stripe/msec 0.00 checksum dispatches 0.00 checksum consumed 93.44 tetrises written 0.00 master tetrises 0.00 slave tetrises 5751.70 stripes written 1650.02 partial stripes 4101.68 full stripes 50985.84 blocks written 6344.04 blocks read 6.33 1 blocks per stripe size 8 3.52 2 blocks per stripe size 8 3.25 3 blocks per stripe size 8 2.14 4 blocks per stripe size 8 1.61 5 blocks per stripe size 8 0.94 6 blocks per stripe size 8 0.52 7 blocks per stripe size 8 0.18 8 blocks per stripe size 8 15.05 1 blocks per stripe size 9 15.24 2 blocks per stripe size 9 17.30 3 blocks per stripe size 9 32.57 4 blocks per stripe size 9 55.75 5 blocks per stripe size 9 97.13 6 blocks per stripe size 9 173.83 7 blocks per stripe size 9 414.49 8 blocks per stripe size 9 2022.78 9 blocks per stripe size 9 15.00 1 blocks per stripe size 10 12.89 2 blocks per stripe size 10 12.84 3 blocks per stripe size 10 16.36 4 blocks per stripe size 10 30.53 5 blocks per stripe size 10 47.87 6 blocks per stripe size 10 84.23 7 blocks per stripe size 10 168.61 8 blocks per stripe size 10 422.01 9 blocks per stripe size 10 2078.72 10 blocks per stripe size 10 Network Interface Statistics (per second) iface side bytes packets multicasts errors collisions pkt drops e0a recv 0.00 0.00 0.00 0.00 0.00 xmit 0.00 0.00 0.00 0.00 0.00 e0b recv 0.00 0.00 0.00 0.00 0.00 xmit 0.00 0.00 0.00 0.00 0.00 e0e recv 215207514.91 11697.72 0.00 0.00 0.00 xmit 386.18 1.99 0.05 0.00 0.00 e0f recv 457.46 2.56 0.00 0.00 0.00 xmit 1318494.11 11092.90 0.06 0.00 0.00 e0M recv 1706.76 11.37 1.63 0.00 0.00 xmit 7157.08 9.93 0.02 0.00 0.00 e0P recv 7.07 0.08 0.00 0.00 0.00 xmit 8.31 0.10 0.02 0.00 0.00 a0a recv 215207972.37 11700.28 0.10 0.00 0.00 xmit 1318880.29 11094.89 0.11 0.00 0.00 a0a-1201 recv 0.00 0.00 0.00 0.00 0.00 xmit 0.00 0.00 0.00 0.00 0.00 a0a-2486 recv 213737593.48 11621.55 0.00 0.00 0.00 xmit 1310305.39 11022.07 0.00 0.00 0.00 a0a-3608 recv 66.33 1.08 1.08 0.00 0.00 xmit 0.00 0.00 0.00 0.00 0.00 Disk Statistics (per second) ut% is the percent of time the disk was busy. xfers is the number of data-transfer commands issued per second. xfers = ureads + writes + cpreads + greads + gwrites chain is the average number of 4K blocks per command. usecs is the average disk round-trip time per 4K block. disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /sata_wh/plex0/rg0: 0a.10.13 42 115.87 0.00 .... . 47.09 61.39 183 68.79 12.76 387 0.00 .... . 0.00 .... . 0a.10.15 49 117.58 0.00 .... . 48.85 59.25 246 68.74 12.76 527 0.00 .... . 0.00 .... . 0a.10.2 59 104.88 15.03 1.02 39509 65.23 41.53 385 24.63 7.53 1887 0.00 .... . 0.00 .... . 0a.10.3 50 97.65 16.48 1.02 37285 62.54 43.32 368 18.63 10.53 1049 0.00 .... . 0.00 .... . 0a.10.4 50 96.12 15.57 1.02 34450 62.29 43.83 360 18.27 9.13 1229 0.00 .... . 0.00 .... . 0a.10.5 49 95.43 16.14 1.02 36738 61.30 44.44 350 17.99 9.48 1125 0.00 .... . 0.00 .... . 0a.10.6 50 97.65 16.02 1.02 36134 62.41 43.57 359 19.22 9.24 1177 0.00 .... . 0.00 .... . 0a.10.7 49 95.28 15.21 1.02 36052 61.87 44.00 358 18.20 9.55 1085 0.00 .... . 0.00 .... . 0a.10.8 49 97.18 15.88 1.02 35157 62.44 43.51 363 18.86 9.69 1100 0.00 .... . 0.00 .... . 0a.10.9 50 95.67 16.17 1.02 36281 61.38 44.28 352 18.12 9.97 1124 0.00 .... . 0.00 .... . 0a.10.1 53 100.78 16.73 1.02 37001 64.45 42.40 376 19.60 9.46 1197 0.00 .... . 0.00 .... . 0a.10.0 52 99.00 16.14 1.02 38089 63.48 43.05 365 19.38 9.50 1232 0.00 .... . 0.00 .... . /sata_wh/plex0/rg1: 0a.10.10 40 108.60 0.00 .... . 45.42 62.65 186 63.18 14.06 363 0.00 .... . 0.00 .... . 0a.10.12 44 109.27 0.00 .... . 46.11 61.74 232 63.16 14.05 467 0.00 .... . 0.00 .... . 0a.10.14 50 97.84 15.16 1.00 33412 63.47 41.64 432 19.21 10.57 1077 0.00 .... . 0.00 .... . 0a.10.16 50 96.24 14.95 1.01 33250 62.32 42.38 411 18.96 10.53 1072 0.00 .... . 0.00 .... . 0a.10.17 50 96.26 15.25 1.01 34504 62.32 42.50 410 18.68 10.17 1101 0.00 .... . 0.00 .... . 0a.10.18 50 95.81 16.37 1.00 34363 61.09 43.29 418 18.35 10.21 1167 0.00 .... . 0.00 .... . 0a.10.19 50 94.00 14.66 1.00 32563 61.32 43.23 411 18.02 10.28 1068 0.00 .... . 0.00 .... . 0a.10.20 50 94.74 15.87 1.00 33920 60.97 43.50 407 17.90 10.90 1052 0.00 .... . 0.00 .... . 0a.10.21 50 94.08 14.95 1.00 32229 60.83 43.63 414 18.29 10.18 1152 0.00 .... . 0.00 .... . 0a.10.22 50 95.56 15.55 1.00 34687 61.55 43.07 418 18.46 10.57 1128 0.00 .... . 0.00 .... . 0a.10.23 50 94.56 15.67 1.00 32724 60.64 43.77 410 18.25 10.51 1145 0.00 .... . 0.00 .... . /aggr0_fas5n/plex0/rg0: 0a.10.0 52 99.00 16.14 1.02 38089 63.48 43.05 365 19.38 9.50 1232 0.00 .... . 0.00 .... . 0a.10.1 53 100.78 16.73 1.02 37001 64.45 42.40 376 19.60 9.46 1197 0.00 .... . 0.00 .... . 0a.10.2 59 104.88 15.03 1.02 39509 65.23 41.53 385 24.63 7.53 1887 0.00 .... . 0.00 .... . 0a.10.3 50 97.65 16.48 1.02 37285 62.54 43.32 368 18.63 10.53 1049 0.00 .... . 0.00 .... . 0a.10.4 50 96.12 15.57 1.02 34450 62.29 43.83 360 18.27 9.13 1229 0.00 .... . 0.00 .... . 0a.10.5 49 95.43 16.14 1.02 36738 61.30 44.44 350 17.99 9.48 1125 0.00 .... . 0.00 .... . 0a.10.6 50 97.65 16.02 1.02 36134 62.41 43.57 359 19.22 9.24 1177 0.00 .... . 0.00 .... . 0a.10.7 49 95.28 15.21 1.02 36052 61.87 44.00 358 18.20 9.55 1085 0.00 .... . 0.00 .... . 0a.10.8 49 97.18 15.88 1.02 35157 62.44 43.51 363 18.86 9.69 1100 0.00 .... . 0.00 .... . 0a.10.9 50 95.67 16.17 1.02 36281 61.38 44.28 352 18.12 9.97 1124 0.00 .... . 0.00 .... . Aggregate statistics: Minimum 40 94.00 0.00 45.42 17.90 0.00 0.00 Mean 50 99.08 13.85 60.50 24.72 0.00 0.00 Maximum 59 117.58 16.73 65.23 68.79 0.00 0.00 Spares and other disks: 0a.10.11 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . FCP Statistics (per second) 0.00 FCP Bytes recv 0.00 FCP Bytes sent 0.00 FCP ops iSCSI Statistics (per second) 0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit 0.00 iSCSI ops Tape Statistics (per second) tape write bytes blocks read bytes blocks BC6505-1:3.126 0.00 0.00 0.00 0.00 BC6505-1:7.126 0.00 0.00 0.00 0.00 BC6505-1:4.126 0.00 0.00 0.00 0.00 BC6505-1:8.126 0.00 0.00 0.00 0.00 Interrupt Statistics (per second) 1585.72 MSI[256](3:49) (1,0,0) PMC SAS/SATA Controller 8001 0.27 MSIX[258](2:50) (3,0,0) QLogic EP 8324 PCI-Express FC controller 0.23 MSIX[260](0:50) (3,0,1) QLogic EP 8324 PCI-Express FC controller 1.06 MSIX[261](1:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller 3.06 MSIX[262](2:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller 2560.91 MSIX[263](3:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller 1.51 MSIX[264](0:51) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller 1.67 MSIX[265](1:52) (4,0,0) QLogic EP 8324 PCI-Express FCoE NIC controller 1.06 MSIX[266](2:52) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller 0.44 MSIX[267](3:52) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller 128.64 MSIX[268](0:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller 1.31 MSIX[269](1:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller 1.81 MSIX[270](2:53) (4,0,1) QLogic EP 8324 PCI-Express FCoE NIC controller 14.72 MSI[277](1:55) (5,0,2) 82580 Quad Copper Gigabit 0.18 MSI[278](2:55) (5,0,3) 82580 Quad Copper Gigabit 0.00 RTC 0.00 IPI 999.37 Msec Clock 5301.96 total NVRAM Statistics (per second) 0.00 total dma transfer KB 210029.27 wafl write req data KB 0.00 dma transactions 0.00 dma destriptors 0.00 waitdone preempts 0.00 waitdone delays 84.87 transactions not queued 3295.91 transactions queued 3302.55 transactions done 5929480.25 total ldma waittime (MS) 5205488476.82 total rdma waittime (MS) 29.39 completion wakeups 210030.95 total nvlog KB 0.00 total nolog KB 0.00 empty entry descriptor pool 0.00 channel1 dma transfer KB 0.00 channel1 dma transactions 0.00 channel1 dma descriptors FlexLog Statistics Initiate transfer latency xfer size < = count lcl avg lcl max rmt avg rmt max 16: 44534094 1.65 104.03 0.15 16.17 256: 14452544 6.93 224.71 0.03 10.34 4096: 6592771 1.75 81.59 0.00 0.00 65536: 1881 2.93 15.72 0.12 0.12 1048576: 0 0.00 0.00 0.00 0.00 --------------------------- ldma isdone: 2.64 us Max: 94.01 us ldma waitdone: 6.91 us Max: 177.92 us rdma isdone: 0.00 us Max: 0.00 us rdma waitdone: 0.00 us Max: 0.00 us BE Resources Depleted Cnt: 0 No Bucket Ready Cnt: 0
Thats a ton of information and I cant find the bottleneck.
sysstat -x's output shows the bottleneck quite clearly under CP (Consistency Point) Time - it is spending almost 100% of its time flushing data to disk. CP Type shows why it is flushing to disk.
This document explains Consistency Points and the impact on performance - https://kb.netapp.com/support/s/article/ka21A0000000jpHQAQ/faq-consistency-point?language=en_US
The FAS2554 has 1280MB of NVRAM - but it is mirroring the other controller, so it has 640MB available. Once it hits a high watermark for that 640MB, it flushes to disk. While it is flushing to disk, any incoming writes are held.
Sorry to say, but I don't think you're likely to hit much higher than 200MByte/sec. Our internal modeling tools suggest you might get a little bit higher, but not too much.
Hi,
if the speed is limited by the disk flushing process here, why is disk utilization only at 40 to 50%? Is there another limiting factor between NVRAM and disks?
Would write speed increase if we increase aggregate's disk number by adding another shelf (f.e. a shelf with 20 4TB disks and 4 SSDs)? (Controller is a single one, it's not HA since this system is only for secondary backup purposes.)
Hmm ok, as a non HA system you get more NVRAM.
But you still have 70-80%+ disk utilisation in the sysstat -x output.
Flashpool helps more with read that write (it only works for overwrites) - so no point in a hybrid shelf.
With the system checkpointing at high watermark almost every second, I don't believe adding disks will help significantly. I would suggest ensuring your system is running 8.3.2P11 and check after the ontap upgrade.
Hmm,
I patched the filer to 8.3.2P11. I will repeat the measurement but I cant do it right now because after reboot the filer startet to update all his snapmirror relations. (The filer is also Snapmirror Destination for some volumes on other filers. These transfers run usually at night in different time windows that the ESXi transfers.)
I would expect with more disks the filer can flush NVRAM faster to disk and clear a CP faster so performance would rise.
The system won't CP more often than once per second, and it's running a CP every second because the high watermark for the NVRAM is full, and the CPs are mostly (from the small sample) finishing within the second. If the CPs were taking longer than a second, yes, more disks might help.
I hear you about more disks enabling the system to return from CP quicker - but we then go back to CP filling and needing to flush again on the same schedule.
I will disclaim that others may see something else in the stats you have posted, but this is as far as my experience would dig.
Again I will bow down to the performance experts, however a couple of comments from the stats you've provided...
I may be reading this wrong, however from the sysstat output there appears to be a larger number of disk writes than net in, suggesting a possible misalignement on the VMware. Are you able to check this: https://kb.netapp.com/support/s/article/ka31A0000000x5ZQAQ/how-to-identify-misalignment-over-nfs-on-the-storage-system?language=en_US
Also, from the statit output the xfers (IOPS) of the disk are up at around 100 per disks. This is generally the most you can get out of a SATA disk, so they look to be running flat out, despite what the utl% is stating. This being the case, more disks may actually help however, since most of the new writes will go to the new disks, this may initially adversley affect performance, see next paragraph...
The data is only being written about 60% of the time to full stripes, suggesting the aggregates are filling, or filled previously and had alot of old data deleted. Maybe looking at a reallocate will help, both to increase the number of contiguous free blocks and if a new RG or 2 are added then even the used space between the disks and therefore allowing all spindles to be written to. See https://library.netapp.com/ecmdocs/ECMLP2348025/html/reallocate/measure.html
However, I would agree with Alex and your underlying issue is the small NVRAM in the system. If an HW upgrade is not possible, then by removing the HA partner (and therefore removing the nodes from the cluster) you would be able to utilise the entire NVRAM.
Thanks,
Grant.
Hi,
thanks to both of you for your Input. Its a single system so it doesn't have to share its 2GB NVRAM.
I did a "reallocate measure (path to my volume) -once". The result in event log is "Allocation measurement check on '/vol/backup2nfs' is 2.". As far as I know this is a good value.
But free space reallocation at aggregate level is set to "off". The Aggregate was filled up to about 90% some weeks ago (mainly due to large non filled thick provisioned volumes), but is now down to 65%.
About misalignment:
VMs are not active in this NFS volume. Their VMDKs are only copied to the volume for backup. I always thought, misalignment is a problem if I write inside a VM which lies misaligned in a volume. "nfsstat -d" seems not to exist in cdot, even at debug command level. It seems to be a 7-mode command.
Hi, you are correct a reallocate measure of 2 suggests a very healthy layout of the used blocks. However, it does appear your free space is not as contiguous as it could be (typical of a previous full aggregate), maybe scheduling a reallocate would assit here, however I do beleive this would only have a slight impact. Please see https://kb.netapp.com/support/s/article/ka31A0000000xBcQAI/how-to-set-a-reallocate-schedule-in-clustered-data-ontap?language=en_US for details.
Regarding the misalignment, OK, sorry I misunderstood the setup. Being just an NFS volume, as you state it will not be an issue.
I'm now not sure I can assit further. I'm not sure the free space reallocate will give you that much as to make a difference, the controller is already in a single node config so maximising the NVRAM. I do believe we're at the point where you are limited by the hardware!
Good luck,
Grant.
Yes, it seems so. I will run reallocation scans over weekend just to be sure.
Do you have an idea what is the reason for the read requests during the writing of the files? There are reads for about 10-15% of all written kilobytes in the sysstat quote which I posted on page 1.
Looking at the statit output, most of the reads are of type cpreads, meaning it's performing parity calculations, hence the majority of the IO against the 2 parity disks in each RG. When it cant write a full stripe it needs to read the parity info and perform some CP calculations before it can write the data.
Otherwise, likely to be background WAFL scanners (privilege command: system node run -node <node> wafl scan status).
Cheers,
Grant.