Network and Storage Protocols

Help with statit interpretation

JOSE_TOME
7,703 Views

Hi.

I need some help to read the output of a statit to see if I can manage to identify any problem on our array. It's a FAS2040, Ontap 7.3.4. 2 shelfs with 24 disks each, one with SAS and the other with SATA. Two controllers with 12 disks of each type. On SAS disk I have some luns assigned by FCP to some hosts, and iSCSI luns for a ESX environment. For SATA disks I have NAS shares via CIFS.

The problem we have is that the backup window for the NAS shares vía NDMP takes more that 31 hours (approx. 3.8TB), we have 100GB/h transfer rate approx, which is very low. I've been watching the environment, had an oppened ticket with Netapp and the only thing that came across was that we have very few disks (RAID-DP with 9 data disks, two parity and a spare). Support told us that if we increase the number of disks, the transfer rate could increase, and I'm trying to get an explanation for this.

I've been taking performance samples with statit for several different scenaries, and could use some help to read one of them, and If possible, have an explanation of the possible HDD bottleneck that we suppose to have.

I've attached the output with some sysstat also.

Thanks for the help.

JVT

8 REPLIES 8

peter_lehmann
7,703 Views

I've had a quick glance on the attached file...

- Cannot see any tape activities in sysstat or statit, is the NDMP Backup a direct one to tape or a 3-way?

- The SATA disks are 20% loaded, so still some headroom before I'd say you need more spindles to get more throughput.

- How much throughput do you get when reading from one of the CIFS shares? You should get at least the same when running NDMP Backup.

- The sysstat seems to show nothing, what was running during the sysstat?

Peter

JOSE_TOME
7,703 Views

I have attached another statit output. I'm trying to get another while dumping data for the backup. This was taken when launching the backup, so it must be reading inodes, creating the backup catalog, etc, before dumping to the VTL we are using. What I see here is that I'm getting about 48% of full striped cout, which I think is very low, I have about 1,07 index while dividing partial/full stripes. Also the cpreads/writes are over 1,2 in several disks in the SATA aggr, but what bothers me is that the array is only reading at 173KB/sec in this time (almost doing nothing there). Could it be that data is very fragmented on disks?

peter_lehmann
7,703 Views

Lets have a look at one where the backup IS running (transferring data). In this one the SATA disks were busier then before, but still not maxed out.

WAFL is a "fragmented" filesystem and in most circumstances has no issues with "fragmentation" (unlike traditional, older filesystems like UFS or NTFS).

JOSE_TOME
7,703 Views

I have read several posts talking about fragmentation, and how this becomes a huge issue while performing sequential read/write operations, so backups performed by other method than snapshots is likely to be affected. Also read that dividing the volumes to be backed up in several "smaller" volumes could help, so it's adding more physical disks to te array. o I'm trying to figure out how can I improve this.

As soon as I got the ststit while transfering data I'll post it.

JOSE_TOME
7,703 Views

Hi!

I finally manage to get some reading when a backup is going on and transfering data to the VTL. This is the output from the statit:

------------------------------------------------------------------------------------------------------------------------

Hostname: SHUSE-FS01  ID: 0135112970  Memory: 2816 MB
  NetApp Release 7.3.4P2: Sat Sep  4 05:11:24 PDT 2010
    <8O>
  Start time: Mon Dec  3 19:01:52 CET 2012

                       CPU Statistics
     315.979006 time (seconds)       100 %
     169.867678 system time           54 %
       9.226622 rupt time              3 %   (2600865 rupts x 4 usec/rupt)
     160.641056 non-rupt system time  51 %
     462.090332 idle time            146 %

     150.136114 time in CP            48 %   100 %
       6.041689 rupt time in CP                4 %   (1561910 rupts x 4 usec/rupt)

                       Multiprocessor Statistics (per second)
                          cpu0       cpu1      total
sk switches           59900.95   56480.88  116381.83
hard switches         34656.38   39071.03   73727.41
domain switches         502.97     705.21    1208.18
CP rupts               4463.69     479.39    4943.08
nonCP rupts            2761.82     526.23    3288.05
IPI rupts                63.27       5.57      68.84
grab kahuna               0.23       0.28       0.51
grab w_xcleaner           0.00      71.94      71.94

grab kahuna usec          2.29       0.90       3.19
grab w_xcleaner usec      0.00   21738.43   21738.43
CP rupt usec          18316.76     803.78   19120.54
nonCP rupt usec        9435.77     643.80   10079.57
idle                 776445.65  685962.69 1462408.34
kahuna                    0.00  223157.23  223157.23
storage               38325.61   12057.61   50383.21
exempt                47537.74   31787.76   79325.50
raid                  34005.82   11549.42   45555.24
target                 4610.26    4937.77    9548.03
netcache                  0.00       0.00       0.00
netcache2                 0.00       0.00       0.00
cifs                  23013.17   15125.81   38138.99
wafl_exempt               0.00       0.00       0.00
wafl_xcleaner             0.00       0.00       0.00
sm_exempt                31.37      19.85      51.22
cluster                   0.00       0.00       0.00
protocol                  0.00       0.00       0.00
nwk_exclusive             0.00       0.00       0.00
nwk_exempt                0.00       0.00       0.00
nwk_legacy            48277.83   13954.28   62232.11
nwk_ctx1                  0.00       0.00       0.00
nwk_ctx2                  0.00       0.00       0.00
nwk_ctx3                  0.00       0.00       0.00
nwk_ctx4                  0.00       0.00       0.00

     120.076056 seconds with one or more CPUs active   ( 38%)

      76.889564 seconds with one CPU active            ( 24%)
      43.186492 seconds with both CPUs active          ( 14%)

                       Domain Utilization of Shared Domains (per second)
      0.00 idle                              0.00 kahuna
      0.00 storage                           0.00 exempt
      0.00 raid                              0.00 target
      0.00 netcache                          0.00 netcache2
      0.00 cifs                              0.00 wafl_exempt
      0.00 wafl_xcleaner                     0.00 sm_exempt
      0.00 cluster                           0.00 protocol
      0.00 nwk_exclusive                     0.00 nwk_exempt
      0.00 nwk_legacy                        0.00 nwk_ctx1
      0.00 nwk_ctx2                          0.00 nwk_ctx3
      0.00 nwk_ctx4


                       CSMP Domain Switches (per second)
   From\To       idle     kahuna    storage     exempt       raid     target   netcache  netcache2       cifs wafl_exempt wafl_xcleaner  sm_exempt    cluster   protocol nwk_exclusive nwk_exempt nwk_legacy   nwk_ctx1   nwk_ctx2   nwk_ctx3   nwk_ctx4
      idle       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
    kahuna       0.00       0.00      11.34       0.96      61.34       1.02       0.00       0.00     195.07       0.00       0.00       0.00       0.00       0.00       0.00       0.00      57.42       0.00       0.00       0.00       0.00
   storage       0.00      11.34       0.00       0.00     274.10       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       2.32       0.00       0.00       0.00       0.00
    exempt       0.00       0.96       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.42       0.00       0.00       0.00       0.00
      raid       0.00      61.34     274.10       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
    target       0.00       1.02       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.09       0.00       0.00       0.00       0.00
  netcache       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
netcache2       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
      cifs       0.00     195.07       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
wafl_exempt       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
wafl_xcleaner       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
sm_exempt       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
   cluster       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
  protocol       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
nwk_exclusive       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
nwk_exempt       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
nwk_legacy       0.00      57.42       2.32       0.42       0.00       0.09       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
  nwk_ctx1       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
  nwk_ctx2       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
  nwk_ctx3       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
  nwk_ctx4       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00

                       Miscellaneous Statistics (per second)
  73727.41 hard context switches             0.07 NFS operations
   1822.92 CIFS operations                   0.00 HTTP operations
      0.00 NetCache URLs                     0.00 streaming packets
   7524.12 network KB received            4445.83 network KB transmitted
  24311.18 disk KB read                  14180.27 disk KB written
   9675.15 NVRAM KB written                  0.00 nolog KB written
   2118.47 WAFL bufs given to clients        0.00 checksum cache hits  (   0%)
      0.00 no checksum - partial buffer    154.69 FCP operations
     79.51 iSCSI operations

                       WAFL Statistics (per second)
   3604.54 name cache hits      (  98%)     88.36 name cache misses    (   2%)
  86664.68 buf hash hits        (  86%)  14148.92 buf hash misses      (  14%)
  12829.76 inode cache hits     ( 100%)     13.07 inode cache misses   (   0%)
  12738.01 buf cache hits       (  88%)   1756.80 buf cache misses     (  12%)
    145.96 blocks read                    5578.99 blocks read-ahead
   1082.83 chains read-ahead               138.71 dummy reads
   3855.41 blocks speculative read-ahead   2851.32 blocks written
     12.05 stripes written                   0.00 blocks over-written
      0.03 wafl_timer generated CP           0.00 snapshot generated CP
      0.00 wafl_avail_bufs generated CP      0.00 dirty_blk_cnt generated CP
      0.03 full NV-log generated CP          0.05 back-to-back CP
      0.00 flush generated CP                0.13 sync generated CP
      0.00 wafl_avail_vbufs generated CP      0.03 deferred back-to-back CP
      0.00 container-indirect-pin CP         0.00 low mbufs generated CP
      0.00 low datavecs generated CP     11773.29 non-restart messages
     91.64 IOWAIT suspends             122333146.43 next nvlog nearly full msecs
      0.00 dirty buffer susp msecs          52.39 nvlog full susp msecs
    565192 buffers

                       RAID Statistics (per second)
    408.53 xors                              0.00 long dispatches [0]
      0.00 long consumed [0]                 0.00 long consumed hipri [0]
      0.00 long low priority [0]             0.00 long high priority [0]
      0.00 long monitor tics [0]             0.00 long monitor clears [0]
      0.00 long dispatches [1]               0.00 long consumed [1]
      0.00 long consumed hipri [1]           0.00 long low priority [1]
      0.00 long high priority [1]            0.00 long monitor tics [1]
      0.00 long monitor clears [1]             18 max batch
      8.56 blocked mode xor                130.55 timed mode xor
      2.53 fast adjustments                  1.07 slow adjustments
         0 avg batch start                      0 avg stripe/msec
     13.25 tetrises written                  0.00 master tetrises
      0.00 slave tetrises                  338.36 stripes written
     70.67 partial stripes                 267.70 full stripes
   2867.78 blocks written                  140.38 blocks read
      5.99 1 blocks per stripe size 9        2.40 2 blocks per stripe size 9
      1.67 3 blocks per stripe size 9        1.99 4 blocks per stripe size 9
      3.42 5 blocks per stripe size 9        5.56 6 blocks per stripe size 9
     12.85 7 blocks per stripe size 9       36.79 8 blocks per stripe size 9
    267.70 9 blocks per stripe size 9

                       Network Interface Statistics (per second)
iface    side      bytes    packets multicasts     errors collisions  pkt drops
e0P      recv      20.56       0.18       0.05       0.00                  0.00
         xmit      12.51       0.14       0.00       0.00       0.00
e0a      recv  161035.45    1006.36       0.00       0.00                  0.00
         xmit  289917.84     535.97       0.04       0.00       0.00
e0b      recv 6466822.64    5105.37       0.00       0.00                  0.00
         xmit 2994190.18    4591.79       0.03       0.00       0.00
e0c      recv 1070085.88    1644.35       0.00       0.00                  0.00
         xmit 1154351.26    1552.51       0.03       0.00       0.00
e0d      recv    6738.45      42.52       0.00       0.00                  0.00
         xmit  114060.17     105.06       0.00       0.00       0.00
vh       recv       0.00       0.00       0.00       0.00                  0.00
         xmit       0.00       0.00       0.00       0.00       0.00
vif01    recv 7707491.93    7778.64       3.54       0.00                  0.00
         xmit 4485364.45    6714.86       0.11       0.00       0.00
vif02    recv    6878.26      43.47       0.01       0.00                  0.00
         xmit  118060.92     108.36       0.00       0.00       0.00

                       Disk Statistics (per second)
        ut% is the percent of time the disk was busy.
        xfers is the number of data-transfer commands issued per second.
        xfers = ureads + writes + cpreads + greads + gwrites
        chain is the average number of 4K blocks per command.
        usecs is the average disk round-trip time per 4K block.

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr0_SASdisks/plex0/rg0:
0d.01.0            3   8.10    0.93   1.06 10118   5.81  16.69   182   1.35   3.78   469   0.00   ....     .   0.00   ....     .
0d.01.2            3   8.33    0.92   1.06 13291   6.04  16.15   184   1.37   4.19   458   0.00   ....     .   0.00   ....     .
0d.01.4           25  73.77   66.56   3.12  4884   5.05  17.03   530   2.16   4.43  1432   0.00   ....     .   0.00   ....     .
0d.01.6           24  72.30   65.45   3.17  4707   4.54  18.83   546   2.32   4.76  1446   0.00   ....     .   0.00   ....     .
0d.01.8           25  72.08   65.34   3.24  4746   4.59  18.56   581   2.15   4.78  1370   0.00   ....     .   0.00   ....     .
0d.01.10          24  72.49   65.72   3.26  4704   4.58  18.61   560   2.18   4.78  1283   0.00   ....     .   0.00   ....     .
0d.01.12          24  72.11   65.46   3.20  4802   4.53  18.87   568   2.11   4.98  1510   0.00   ....     .   0.00   ....     .
0d.01.14          24  73.16   66.09   3.16  4718   4.70  17.94   592   2.37   5.00  1242   0.00   ....     .   0.00   ....     .
0d.01.16          25  73.00   66.21   3.16  4889   4.62  18.54   614   2.16   4.60  1694   0.00   ....     .   0.00   ....     .
0d.01.18          25  73.88   67.18   3.16  4795   4.47  19.14   568   2.22   4.68  1337   0.00   ....     .   0.00   ....     .
0d.01.20          24  72.54   65.73   3.13  4863   4.58  18.55   601   2.23   4.84  1463   0.00   ....     .   0.00   ....     .
/aggr1_SATAdisks/plex0/rg0:
0d.02.2            8  11.18    0.58   1.00 16228   9.25  26.17   399   1.35   6.16   671   0.00   ....     .   0.00   ....     .
0d.02.18           8  11.39    0.58   1.00 28214   9.51  25.51   424   1.31   5.29   814   0.00   ....     .   0.00   ....     .
0d.02.22          80  99.85   88.91   5.03  7772   9.26  25.32  1357   1.67   4.69  5874   0.00   ....     .   0.00   ....     .
0d.02.4           77  98.44   88.15   5.04  7084   8.69  26.87  1303   1.60   6.09  3727   0.00   ....     .   0.00   ....     .
0d.02.6           78  98.17   87.79   5.05  7206   8.74  26.70  1283   1.64   5.82  4108   0.00   ....     .   0.00   ....     .
0d.02.8           78  97.10   86.95   5.11  7108   8.63  27.10  1324   1.52   5.60  4260   0.00   ....     .   0.00   ....     .
0d.02.10          77  97.71   87.38   5.02  7295   8.69  26.76  1341   1.65   6.29  3969   0.00   ....     .   0.00   ....     .
0d.02.12          78  99.41   89.02   5.00  7469   8.77  26.57  1330   1.62   5.53  4288   0.00   ....     .   0.00   ....     .
0d.02.14          78  98.23   88.11   5.03  7235   8.66  27.01  1278   1.46   5.88  4100   0.00   ....     .   0.00   ....     .
0d.02.16          77  97.74   87.05   5.03  7208   8.81  26.08  1330   1.88   7.13  3392   0.00   ....     .   0.00   ....     .
0d.02.20          77  98.03   87.43   5.00  7278   8.78  26.54  1301   1.82   5.57  4240   0.00   ....     .   0.00   ....     .

Aggregate statistics:
Minimum            3   8.10    0.58                4.47                1.31                0.00                0.00
Mean              43  71.77   63.07                6.88                1.82                0.00                0.00
Maximum           80  99.85   89.02                9.51                2.37                0.00                0.00

Spares and other disks:
0d.01.1            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.3            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.5            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.7            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.9            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.11           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.13           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.15           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.17           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.19           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.21           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.22           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.01.23           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.0            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.1            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.3            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.5            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.7            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.9            0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.11           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.13           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.15           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.17           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.19           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.21           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

Spares and other disks:
0d.02.23           0   0.00    0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .   0.00   ....     .

                       FCP Statistics (per second)
2792330.86 FCP Bytes recv              3958402.35 FCP Bytes sent
    154.69 FCP ops

                       iSCSI Statistics (per second)
1303016.39 iSCSI Bytes recv            1195745.34 iSCSI Bytes xmit
     79.51 iSCSI ops

                       Tape Statistics (per second)

tape                             write bytes blocks    read bytes blocks
SHUSE-SAN01:7.125                 9849304.89   37.57         0.00    0.00
SHUSE-SAN01:7.125L1                  1659.25    0.01         0.00    0.00
SHUSE-SAN01:7.125L2                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L3                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L4                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L5                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L6                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L7                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L8                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L9                     0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L10                    0.00    0.00         0.00    0.00
SHUSE-SAN01:7.125L11                    0.00    0.00         0.00    0.00

                       Interrupt Statistics (per second)
   2000.03 Clock (IRQ 0)                  4061.30 PCI direct (IRQ 16)
   2100.60 PCI direct (IRQ 17)               0.00 RTC
     68.84 IPI                            8230.77 total

                       NVRAM Statistics (per second)
      0.00 total dma transfer KB             0.00 wafl write req data KB
      0.00 dma transactions                  0.00 dma destriptors
   2787.38 waitdone preempts                 0.01 waitdone delays
      0.02 transactions not queued         335.84 transactions queued
    336.80 transactions done                42.81 total waittime (MS)
   1479.39 completion wakeups              197.86 nvdma completion wakeups
    118.72 nvdma completion waitdone      9674.19 total nvlog KB
      0.00 nvlog shadow header array full      0.00 channel1 dma transfer KB
      0.00 channel1 dma transactions         0.00 channel1 dma descriptors

                       E7520 Data Mover Statistics (per second)
  10334.55 total dma transfer KB             4.94 total bcopy transfer KB
      2.60 total waittime (MS)

------------------------------------------------------------------------------------------------------------------------

And also I got some output from a sysstat when the statit was running:

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

26%     0  1947     0    2124 18008  2455  20341   2475     0     0     8s  67%  13%  Fn  73%    136    41   287   439   306  1181

71%     0  3462     0    3564 66792  2633  13299  89953     0     0    28s  95%  95%  Fn  84%     80    22   342   303   281   276

71%     0  3618     0    3723 64474  3447  19938  78679     0     0    28s  92%  61%  Fn  78%     75    30   262   181   230   904

65%     0  3437     0    3535 62042  2941  14108  82709     0     0     4s  94%  89%  F   82%     69    29   283   146   352   395

70%     0  3449     0    3570 67054  3342  18552  83138     0     0    30s  93%  84%  Ff  85%     86    35   464    74   308   802

17%     0  1852     0    1956   960  2431  22146      0     0     0     4s  62%   0%  -   85%     55    49   206    38   540   958

17%     1  1625     0    1805   700  2590  23444     12     0     0     8s  64%   0%  -   83%    161    18   594  2425   369     0

19%     0  1707     0    2427   903  2403  27926      0     0     0     8s  65%   0%  -   84%    667    53   904  4038   514  1187

48%     1  3718     0    8367  2324 73504  87383  21052     0     0     3s  92% 100%  :f  79%   4611    37   247 18768   218   701

33%     0  3156     0    3230  1554 37708  45966   7364     0     0     3s  81%  99%  Zf  94%     47    27   432     9   225   197

32%     0  3168     0    3245  1612 44303  58272   3710     0     0     2s  78%  99%  Zf  98%     45    32   726    17   143   985

31%     0  3233     0    3311  1830 43011  50216   5395     0     0    54s  85%  99%  Zf 100%     58    20   481    16   386     0

34%     0  3611     0    3750  1896 48525  56898   3962     0     0    57s  82%  99%  Zf  98%     78    61   945    20   289   852

30%     0  3106     0    3203  1699 43288  58432   5382     0     0    58s  86%  99%  Zn  95%     63    34   774     2   277   335

32%     0  2992     0    3104  1880 50770  66186   5510     0     0    59s  87%  99%  Zn  98%     58    54   514    18   317    66

29%     0  3019     0    3209  1630 41634  55848   6352     0     0     1   86%  99%  Zn 100%    141    49  1023   296   253  1116

43%     0  4055     0    4315  2447 69478  74662  11116     0     0     1   89%  99%  Zf  89%    196    64  4232  1454   362   453

Now I do see some massive disk usage for the SATA aggregate. But still little traffic from the interfaces and poorly 24,67MB/s activity from disks (at 80% disk utilization?)

JOSE_TOME
7,703 Views

Is it possible from the statit output to calculate the average read/write IOPS that are requested to the array in order to compare it with the "theoretical" IOPS the array is capable of serve by the ammount of disks it has? Make sense?

peter_lehmann
7,703 Views

I'd say that the SATA disks are the bottleneck (no surprise). You do have a lot of CIFS IOPS when the backup is running. This is slowing it down too, because there are a lot of CP's being generated. Maybe you can move the CIFS activity and the Backup Activity to different timeslots, that would certainly help.

Comparing the current IOPS with the "theoretical" IOPS is difficult but can be done. I'd recommend you to get someone from NetApp or a Partner Company with performance troubleshooting experience involved at this stage.

JOSE_TOME
7,703 Views

Hi.

Thanks for the response.

I've been doing some tests. Yesterday I created a new volume on the same SATA aggregate (1TB volume), and copied about 100GB of files of 1-1,5GB each. I made a dump to null and took some statit/sysstat info. The dump was made in 6 minutes, I registered a throughtput of about 1TB/h (close to 300MB/s).

Today I decided to do another dump to null with a production volume, specifically with the userfiles share which has about 3TB of data in small files (about 3million files). The dump was aborted in about one hour and it gets to read 240GB, getting a throughput close to 300GB/h (88MB/s). I also get statit/sysstat info for this.

Both volumes are on the same aggregate, meaning, same physical disks. The only thing I can "conclude" on this is that the file directory structure, along with the filesize and ammount of files is impacting the reading process.

Here's some output from the dump, where you can see the time spent on each Pass of the dump:

----------------------------------------------------------------------------------

DUMP: creating "/vol/usuarios/../snapshot_for_backup.511" snapshot.

DUMP: Using Full Volume Dump

DUMP: Dumping tape file 1 on null

DUMP: Date of this level 0 dump: Tue Dec 18 10:09:03 2012.

DUMP: Date of last level 0 dump: the epoch.

DUMP: Dumping /vol/usuarios to null

DUMP: mapping (Pass I)[regular files]

DUMP: mapping (Pass II)[directories]

DUMP: estimated 3080484582 KB.

DUMP: dumping (Pass III) [directories]

DUMP: Tue Dec 18 10:21:01 2012 : We have written 370385 KB.

DUMP: Tue Dec 18 10:26:01 2012 : We have written 1142394 KB.

DUMP: dumping (Pass IV) [regular files]

DUMP: Tue Dec 18 10:31:01 2012 : We have written 11003960 KB.

DUMP: Tue Dec 18 10:36:01 2012 : We have written 43909314 KB.

DUMP: Tue Dec 18 10:41:01 2012 : We have written 82547223 KB.

DUMP: Tue Dec 18 10:46:01 2012 : We have written 116505114 KB.

DUMP: Tue Dec 18 10:51:01 2012 : We have written 149442003 KB.

DUMP: Tue Dec 18 10:56:01 2012 : We have written 183890952 KB.

DUMP: Tue Dec 18 11:01:01 2012 : We have written 219154461 KB.

DUMP: Tue Dec 18 11:06:01 2012 : We have written 251863963 KB.

----------------------------------------------------------------------------------

I've read about some other environments with the same array, talking about having a lot more millions of files that what we do. Can I actually conclude that this is what is messing with the backups? how can I prove this (with numbers)?

Public