Network and Storage Protocols
Network and Storage Protocols
Hi.
I need some help to read the output of a statit to see if I can manage to identify any problem on our array. It's a FAS2040, Ontap 7.3.4. 2 shelfs with 24 disks each, one with SAS and the other with SATA. Two controllers with 12 disks of each type. On SAS disk I have some luns assigned by FCP to some hosts, and iSCSI luns for a ESX environment. For SATA disks I have NAS shares via CIFS.
The problem we have is that the backup window for the NAS shares vía NDMP takes more that 31 hours (approx. 3.8TB), we have 100GB/h transfer rate approx, which is very low. I've been watching the environment, had an oppened ticket with Netapp and the only thing that came across was that we have very few disks (RAID-DP with 9 data disks, two parity and a spare). Support told us that if we increase the number of disks, the transfer rate could increase, and I'm trying to get an explanation for this.
I've been taking performance samples with statit for several different scenaries, and could use some help to read one of them, and If possible, have an explanation of the possible HDD bottleneck that we suppose to have.
I've attached the output with some sysstat also.
Thanks for the help.
JVT
I've had a quick glance on the attached file...
- Cannot see any tape activities in sysstat or statit, is the NDMP Backup a direct one to tape or a 3-way?
- The SATA disks are 20% loaded, so still some headroom before I'd say you need more spindles to get more throughput.
- How much throughput do you get when reading from one of the CIFS shares? You should get at least the same when running NDMP Backup.
- The sysstat seems to show nothing, what was running during the sysstat?
Peter
I have attached another statit output. I'm trying to get another while dumping data for the backup. This was taken when launching the backup, so it must be reading inodes, creating the backup catalog, etc, before dumping to the VTL we are using. What I see here is that I'm getting about 48% of full striped cout, which I think is very low, I have about 1,07 index while dividing partial/full stripes. Also the cpreads/writes are over 1,2 in several disks in the SATA aggr, but what bothers me is that the array is only reading at 173KB/sec in this time (almost doing nothing there). Could it be that data is very fragmented on disks?
Lets have a look at one where the backup IS running (transferring data). In this one the SATA disks were busier then before, but still not maxed out.
WAFL is a "fragmented" filesystem and in most circumstances has no issues with "fragmentation" (unlike traditional, older filesystems like UFS or NTFS).
I have read several posts talking about fragmentation, and how this becomes a huge issue while performing sequential read/write operations, so backups performed by other method than snapshots is likely to be affected. Also read that dividing the volumes to be backed up in several "smaller" volumes could help, so it's adding more physical disks to te array. o I'm trying to figure out how can I improve this.
As soon as I got the ststit while transfering data I'll post it.
Hi!
I finally manage to get some reading when a backup is going on and transfering data to the VTL. This is the output from the statit:
------------------------------------------------------------------------------------------------------------------------
Hostname: SHUSE-FS01 ID: 0135112970 Memory: 2816 MB
NetApp Release 7.3.4P2: Sat Sep 4 05:11:24 PDT 2010
<8O>
Start time: Mon Dec 3 19:01:52 CET 2012
CPU Statistics
315.979006 time (seconds) 100 %
169.867678 system time 54 %
9.226622 rupt time 3 % (2600865 rupts x 4 usec/rupt)
160.641056 non-rupt system time 51 %
462.090332 idle time 146 %
150.136114 time in CP 48 % 100 %
6.041689 rupt time in CP 4 % (1561910 rupts x 4 usec/rupt)
Multiprocessor Statistics (per second)
cpu0 cpu1 total
sk switches 59900.95 56480.88 116381.83
hard switches 34656.38 39071.03 73727.41
domain switches 502.97 705.21 1208.18
CP rupts 4463.69 479.39 4943.08
nonCP rupts 2761.82 526.23 3288.05
IPI rupts 63.27 5.57 68.84
grab kahuna 0.23 0.28 0.51
grab w_xcleaner 0.00 71.94 71.94
grab kahuna usec 2.29 0.90 3.19
grab w_xcleaner usec 0.00 21738.43 21738.43
CP rupt usec 18316.76 803.78 19120.54
nonCP rupt usec 9435.77 643.80 10079.57
idle 776445.65 685962.69 1462408.34
kahuna 0.00 223157.23 223157.23
storage 38325.61 12057.61 50383.21
exempt 47537.74 31787.76 79325.50
raid 34005.82 11549.42 45555.24
target 4610.26 4937.77 9548.03
netcache 0.00 0.00 0.00
netcache2 0.00 0.00 0.00
cifs 23013.17 15125.81 38138.99
wafl_exempt 0.00 0.00 0.00
wafl_xcleaner 0.00 0.00 0.00
sm_exempt 31.37 19.85 51.22
cluster 0.00 0.00 0.00
protocol 0.00 0.00 0.00
nwk_exclusive 0.00 0.00 0.00
nwk_exempt 0.00 0.00 0.00
nwk_legacy 48277.83 13954.28 62232.11
nwk_ctx1 0.00 0.00 0.00
nwk_ctx2 0.00 0.00 0.00
nwk_ctx3 0.00 0.00 0.00
nwk_ctx4 0.00 0.00 0.00
120.076056 seconds with one or more CPUs active ( 38%)
76.889564 seconds with one CPU active ( 24%)
43.186492 seconds with both CPUs active ( 14%)
Domain Utilization of Shared Domains (per second)
0.00 idle 0.00 kahuna
0.00 storage 0.00 exempt
0.00 raid 0.00 target
0.00 netcache 0.00 netcache2
0.00 cifs 0.00 wafl_exempt
0.00 wafl_xcleaner 0.00 sm_exempt
0.00 cluster 0.00 protocol
0.00 nwk_exclusive 0.00 nwk_exempt
0.00 nwk_legacy 0.00 nwk_ctx1
0.00 nwk_ctx2 0.00 nwk_ctx3
0.00 nwk_ctx4
CSMP Domain Switches (per second)
From\To idle kahuna storage exempt raid target netcache netcache2 cifs wafl_exempt wafl_xcleaner sm_exempt cluster protocol nwk_exclusive nwk_exempt nwk_legacy nwk_ctx1 nwk_ctx2 nwk_ctx3 nwk_ctx4
idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
kahuna 0.00 0.00 11.34 0.96 61.34 1.02 0.00 0.00 195.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 57.42 0.00 0.00 0.00 0.00
storage 0.00 11.34 0.00 0.00 274.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00 0.00
exempt 0.00 0.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.42 0.00 0.00 0.00 0.00
raid 0.00 61.34 274.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
target 0.00 1.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.00 0.00 0.00
netcache 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
netcache2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cifs 0.00 195.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wafl_exempt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wafl_xcleaner 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sm_exempt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cluster 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
protocol 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_exclusive 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_exempt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_legacy 0.00 57.42 2.32 0.42 0.00 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_ctx1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_ctx2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_ctx3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nwk_ctx4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Miscellaneous Statistics (per second)
73727.41 hard context switches 0.07 NFS operations
1822.92 CIFS operations 0.00 HTTP operations
0.00 NetCache URLs 0.00 streaming packets
7524.12 network KB received 4445.83 network KB transmitted
24311.18 disk KB read 14180.27 disk KB written
9675.15 NVRAM KB written 0.00 nolog KB written
2118.47 WAFL bufs given to clients 0.00 checksum cache hits ( 0%)
0.00 no checksum - partial buffer 154.69 FCP operations
79.51 iSCSI operations
WAFL Statistics (per second)
3604.54 name cache hits ( 98%) 88.36 name cache misses ( 2%)
86664.68 buf hash hits ( 86%) 14148.92 buf hash misses ( 14%)
12829.76 inode cache hits ( 100%) 13.07 inode cache misses ( 0%)
12738.01 buf cache hits ( 88%) 1756.80 buf cache misses ( 12%)
145.96 blocks read 5578.99 blocks read-ahead
1082.83 chains read-ahead 138.71 dummy reads
3855.41 blocks speculative read-ahead 2851.32 blocks written
12.05 stripes written 0.00 blocks over-written
0.03 wafl_timer generated CP 0.00 snapshot generated CP
0.00 wafl_avail_bufs generated CP 0.00 dirty_blk_cnt generated CP
0.03 full NV-log generated CP 0.05 back-to-back CP
0.00 flush generated CP 0.13 sync generated CP
0.00 wafl_avail_vbufs generated CP 0.03 deferred back-to-back CP
0.00 container-indirect-pin CP 0.00 low mbufs generated CP
0.00 low datavecs generated CP 11773.29 non-restart messages
91.64 IOWAIT suspends 122333146.43 next nvlog nearly full msecs
0.00 dirty buffer susp msecs 52.39 nvlog full susp msecs
565192 buffers
RAID Statistics (per second)
408.53 xors 0.00 long dispatches [0]
0.00 long consumed [0] 0.00 long consumed hipri [0]
0.00 long low priority [0] 0.00 long high priority [0]
0.00 long monitor tics [0] 0.00 long monitor clears [0]
0.00 long dispatches [1] 0.00 long consumed [1]
0.00 long consumed hipri [1] 0.00 long low priority [1]
0.00 long high priority [1] 0.00 long monitor tics [1]
0.00 long monitor clears [1] 18 max batch
8.56 blocked mode xor 130.55 timed mode xor
2.53 fast adjustments 1.07 slow adjustments
0 avg batch start 0 avg stripe/msec
13.25 tetrises written 0.00 master tetrises
0.00 slave tetrises 338.36 stripes written
70.67 partial stripes 267.70 full stripes
2867.78 blocks written 140.38 blocks read
5.99 1 blocks per stripe size 9 2.40 2 blocks per stripe size 9
1.67 3 blocks per stripe size 9 1.99 4 blocks per stripe size 9
3.42 5 blocks per stripe size 9 5.56 6 blocks per stripe size 9
12.85 7 blocks per stripe size 9 36.79 8 blocks per stripe size 9
267.70 9 blocks per stripe size 9
Network Interface Statistics (per second)
iface side bytes packets multicasts errors collisions pkt drops
e0P recv 20.56 0.18 0.05 0.00 0.00
xmit 12.51 0.14 0.00 0.00 0.00
e0a recv 161035.45 1006.36 0.00 0.00 0.00
xmit 289917.84 535.97 0.04 0.00 0.00
e0b recv 6466822.64 5105.37 0.00 0.00 0.00
xmit 2994190.18 4591.79 0.03 0.00 0.00
e0c recv 1070085.88 1644.35 0.00 0.00 0.00
xmit 1154351.26 1552.51 0.03 0.00 0.00
e0d recv 6738.45 42.52 0.00 0.00 0.00
xmit 114060.17 105.06 0.00 0.00 0.00
vh recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
vif01 recv 7707491.93 7778.64 3.54 0.00 0.00
xmit 4485364.45 6714.86 0.11 0.00 0.00
vif02 recv 6878.26 43.47 0.01 0.00 0.00
xmit 118060.92 108.36 0.00 0.00 0.00
Disk Statistics (per second)
ut% is the percent of time the disk was busy.
xfers is the number of data-transfer commands issued per second.
xfers = ureads + writes + cpreads + greads + gwrites
chain is the average number of 4K blocks per command.
usecs is the average disk round-trip time per 4K block.
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0_SASdisks/plex0/rg0:
0d.01.0 3 8.10 0.93 1.06 10118 5.81 16.69 182 1.35 3.78 469 0.00 .... . 0.00 .... .
0d.01.2 3 8.33 0.92 1.06 13291 6.04 16.15 184 1.37 4.19 458 0.00 .... . 0.00 .... .
0d.01.4 25 73.77 66.56 3.12 4884 5.05 17.03 530 2.16 4.43 1432 0.00 .... . 0.00 .... .
0d.01.6 24 72.30 65.45 3.17 4707 4.54 18.83 546 2.32 4.76 1446 0.00 .... . 0.00 .... .
0d.01.8 25 72.08 65.34 3.24 4746 4.59 18.56 581 2.15 4.78 1370 0.00 .... . 0.00 .... .
0d.01.10 24 72.49 65.72 3.26 4704 4.58 18.61 560 2.18 4.78 1283 0.00 .... . 0.00 .... .
0d.01.12 24 72.11 65.46 3.20 4802 4.53 18.87 568 2.11 4.98 1510 0.00 .... . 0.00 .... .
0d.01.14 24 73.16 66.09 3.16 4718 4.70 17.94 592 2.37 5.00 1242 0.00 .... . 0.00 .... .
0d.01.16 25 73.00 66.21 3.16 4889 4.62 18.54 614 2.16 4.60 1694 0.00 .... . 0.00 .... .
0d.01.18 25 73.88 67.18 3.16 4795 4.47 19.14 568 2.22 4.68 1337 0.00 .... . 0.00 .... .
0d.01.20 24 72.54 65.73 3.13 4863 4.58 18.55 601 2.23 4.84 1463 0.00 .... . 0.00 .... .
/aggr1_SATAdisks/plex0/rg0:
0d.02.2 8 11.18 0.58 1.00 16228 9.25 26.17 399 1.35 6.16 671 0.00 .... . 0.00 .... .
0d.02.18 8 11.39 0.58 1.00 28214 9.51 25.51 424 1.31 5.29 814 0.00 .... . 0.00 .... .
0d.02.22 80 99.85 88.91 5.03 7772 9.26 25.32 1357 1.67 4.69 5874 0.00 .... . 0.00 .... .
0d.02.4 77 98.44 88.15 5.04 7084 8.69 26.87 1303 1.60 6.09 3727 0.00 .... . 0.00 .... .
0d.02.6 78 98.17 87.79 5.05 7206 8.74 26.70 1283 1.64 5.82 4108 0.00 .... . 0.00 .... .
0d.02.8 78 97.10 86.95 5.11 7108 8.63 27.10 1324 1.52 5.60 4260 0.00 .... . 0.00 .... .
0d.02.10 77 97.71 87.38 5.02 7295 8.69 26.76 1341 1.65 6.29 3969 0.00 .... . 0.00 .... .
0d.02.12 78 99.41 89.02 5.00 7469 8.77 26.57 1330 1.62 5.53 4288 0.00 .... . 0.00 .... .
0d.02.14 78 98.23 88.11 5.03 7235 8.66 27.01 1278 1.46 5.88 4100 0.00 .... . 0.00 .... .
0d.02.16 77 97.74 87.05 5.03 7208 8.81 26.08 1330 1.88 7.13 3392 0.00 .... . 0.00 .... .
0d.02.20 77 98.03 87.43 5.00 7278 8.78 26.54 1301 1.82 5.57 4240 0.00 .... . 0.00 .... .
Aggregate statistics:
Minimum 3 8.10 0.58 4.47 1.31 0.00 0.00
Mean 43 71.77 63.07 6.88 1.82 0.00 0.00
Maximum 80 99.85 89.02 9.51 2.37 0.00 0.00
Spares and other disks:
0d.01.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.3 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.5 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.7 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.9 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.11 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.13 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.15 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.19 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.21 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.22 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.01.23 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.3 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.5 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.7 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.9 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.11 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.13 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.15 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.19 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.21 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
Spares and other disks:
0d.02.23 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
FCP Statistics (per second)
2792330.86 FCP Bytes recv 3958402.35 FCP Bytes sent
154.69 FCP ops
iSCSI Statistics (per second)
1303016.39 iSCSI Bytes recv 1195745.34 iSCSI Bytes xmit
79.51 iSCSI ops
Tape Statistics (per second)
tape write bytes blocks read bytes blocks
SHUSE-SAN01:7.125 9849304.89 37.57 0.00 0.00
SHUSE-SAN01:7.125L1 1659.25 0.01 0.00 0.00
SHUSE-SAN01:7.125L2 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L3 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L4 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L5 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L6 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L7 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L8 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L9 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L10 0.00 0.00 0.00 0.00
SHUSE-SAN01:7.125L11 0.00 0.00 0.00 0.00
Interrupt Statistics (per second)
2000.03 Clock (IRQ 0) 4061.30 PCI direct (IRQ 16)
2100.60 PCI direct (IRQ 17) 0.00 RTC
68.84 IPI 8230.77 total
NVRAM Statistics (per second)
0.00 total dma transfer KB 0.00 wafl write req data KB
0.00 dma transactions 0.00 dma destriptors
2787.38 waitdone preempts 0.01 waitdone delays
0.02 transactions not queued 335.84 transactions queued
336.80 transactions done 42.81 total waittime (MS)
1479.39 completion wakeups 197.86 nvdma completion wakeups
118.72 nvdma completion waitdone 9674.19 total nvlog KB
0.00 nvlog shadow header array full 0.00 channel1 dma transfer KB
0.00 channel1 dma transactions 0.00 channel1 dma descriptors
E7520 Data Mover Statistics (per second)
10334.55 total dma transfer KB 4.94 total bcopy transfer KB
2.60 total waittime (MS)
------------------------------------------------------------------------------------------------------------------------
And also I got some output from a sysstat when the statit was running:
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
26% 0 1947 0 2124 18008 2455 20341 2475 0 0 8s 67% 13% Fn 73% 136 41 287 439 306 1181
71% 0 3462 0 3564 66792 2633 13299 89953 0 0 28s 95% 95% Fn 84% 80 22 342 303 281 276
71% 0 3618 0 3723 64474 3447 19938 78679 0 0 28s 92% 61% Fn 78% 75 30 262 181 230 904
65% 0 3437 0 3535 62042 2941 14108 82709 0 0 4s 94% 89% F 82% 69 29 283 146 352 395
70% 0 3449 0 3570 67054 3342 18552 83138 0 0 30s 93% 84% Ff 85% 86 35 464 74 308 802
17% 0 1852 0 1956 960 2431 22146 0 0 0 4s 62% 0% - 85% 55 49 206 38 540 958
17% 1 1625 0 1805 700 2590 23444 12 0 0 8s 64% 0% - 83% 161 18 594 2425 369 0
19% 0 1707 0 2427 903 2403 27926 0 0 0 8s 65% 0% - 84% 667 53 904 4038 514 1187
48% 1 3718 0 8367 2324 73504 87383 21052 0 0 3s 92% 100% :f 79% 4611 37 247 18768 218 701
33% 0 3156 0 3230 1554 37708 45966 7364 0 0 3s 81% 99% Zf 94% 47 27 432 9 225 197
32% 0 3168 0 3245 1612 44303 58272 3710 0 0 2s 78% 99% Zf 98% 45 32 726 17 143 985
31% 0 3233 0 3311 1830 43011 50216 5395 0 0 54s 85% 99% Zf 100% 58 20 481 16 386 0
34% 0 3611 0 3750 1896 48525 56898 3962 0 0 57s 82% 99% Zf 98% 78 61 945 20 289 852
30% 0 3106 0 3203 1699 43288 58432 5382 0 0 58s 86% 99% Zn 95% 63 34 774 2 277 335
32% 0 2992 0 3104 1880 50770 66186 5510 0 0 59s 87% 99% Zn 98% 58 54 514 18 317 66
29% 0 3019 0 3209 1630 41634 55848 6352 0 0 1 86% 99% Zn 100% 141 49 1023 296 253 1116
43% 0 4055 0 4315 2447 69478 74662 11116 0 0 1 89% 99% Zf 89% 196 64 4232 1454 362 453
Now I do see some massive disk usage for the SATA aggregate. But still little traffic from the interfaces and poorly 24,67MB/s activity from disks (at 80% disk utilization?)
Is it possible from the statit output to calculate the average read/write IOPS that are requested to the array in order to compare it with the "theoretical" IOPS the array is capable of serve by the ammount of disks it has? Make sense?
I'd say that the SATA disks are the bottleneck (no surprise). You do have a lot of CIFS IOPS when the backup is running. This is slowing it down too, because there are a lot of CP's being generated. Maybe you can move the CIFS activity and the Backup Activity to different timeslots, that would certainly help.
Comparing the current IOPS with the "theoretical" IOPS is difficult but can be done. I'd recommend you to get someone from NetApp or a Partner Company with performance troubleshooting experience involved at this stage.
Hi.
Thanks for the response.
I've been doing some tests. Yesterday I created a new volume on the same SATA aggregate (1TB volume), and copied about 100GB of files of 1-1,5GB each. I made a dump to null and took some statit/sysstat info. The dump was made in 6 minutes, I registered a throughtput of about 1TB/h (close to 300MB/s).
Today I decided to do another dump to null with a production volume, specifically with the userfiles share which has about 3TB of data in small files (about 3million files). The dump was aborted in about one hour and it gets to read 240GB, getting a throughput close to 300GB/h (88MB/s). I also get statit/sysstat info for this.
Both volumes are on the same aggregate, meaning, same physical disks. The only thing I can "conclude" on this is that the file directory structure, along with the filesize and ammount of files is impacting the reading process.
Here's some output from the dump, where you can see the time spent on each Pass of the dump:
----------------------------------------------------------------------------------
DUMP: creating "/vol/usuarios/../snapshot_for_backup.511" snapshot.
DUMP: Using Full Volume Dump
DUMP: Dumping tape file 1 on null
DUMP: Date of this level 0 dump: Tue Dec 18 10:09:03 2012.
DUMP: Date of last level 0 dump: the epoch.
DUMP: Dumping /vol/usuarios to null
DUMP: mapping (Pass I)[regular files]
DUMP: mapping (Pass II)[directories]
DUMP: estimated 3080484582 KB.
DUMP: dumping (Pass III) [directories]
DUMP: Tue Dec 18 10:21:01 2012 : We have written 370385 KB.
DUMP: Tue Dec 18 10:26:01 2012 : We have written 1142394 KB.
DUMP: dumping (Pass IV) [regular files]
DUMP: Tue Dec 18 10:31:01 2012 : We have written 11003960 KB.
DUMP: Tue Dec 18 10:36:01 2012 : We have written 43909314 KB.
DUMP: Tue Dec 18 10:41:01 2012 : We have written 82547223 KB.
DUMP: Tue Dec 18 10:46:01 2012 : We have written 116505114 KB.
DUMP: Tue Dec 18 10:51:01 2012 : We have written 149442003 KB.
DUMP: Tue Dec 18 10:56:01 2012 : We have written 183890952 KB.
DUMP: Tue Dec 18 11:01:01 2012 : We have written 219154461 KB.
DUMP: Tue Dec 18 11:06:01 2012 : We have written 251863963 KB.
----------------------------------------------------------------------------------
I've read about some other environments with the same array, talking about having a lot more millions of files that what we do. Can I actually conclude that this is what is messing with the backups? how can I prove this (with numbers)?