ONTAP Discussions

FAS2020 Performance issues

NETDIREKT
6,923 Views

Hello,

We are using FAS2020 on 5 different locations.

We just updated to 7.3.6 because of some bugs.

Generally CPU usage: %90

I/O Bytes/sec 383 Kbytes/S

OPS/SEC 2500

Read Latency 8000msec

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache

                                  in   out   read  write  read write   age   hit

82%  2242     0     0    2242 18016 44599  58914  16986     0     0     3s  80%

65%  2050     0     0    2050  6394 46315  57136   5035     0     0     2s  83%

68%  1807     0     0    1807 11131 39836  54533  10247     0     0     2s  79%

67%  1833     0     0    1833 10571 37530  50131  10699     0     0     2s  82%

73%  2025     0     0    2025 13901 39879  53675  11781     0     0     3s  81%

I think that, this performance performance should be more? What about your I/O, cpu values?

Regards

18 REPLIES 18

jakob_bena
6,887 Views

hello,

i think you dont get much more performace from this system, it is the smallest filer of this line.

maybe someone else make other experiences.

regards

stevensmithSCC
6,887 Views

Hi there,

Performance problems can be somewhat difficult to identify. I can give yousome pointers on here but you will likely require someone with performanceanalysis skills to go through the system and identify the bottleneck, likelyNetApp PS or your own partner PS.

Firstly, you are pushing 50-70MB / sec through the system, which is not insignificantamount of data for a FAS2020. But the CPU does not seem to busy, the standardsysstat output CPU busy measure is not perfect. If you use "sysstat -ms1" you will be able to view busy states on all cores in the system in nearreal time. But I doubt the system is CPU limited, normally CPU does not becomean issue until the system is running at 90%+ Utilisation all of the time.

This leads onto the likely culprit is disk utilisation. To get an idea ofhow busy the disks are "sysstat -us 1" will give a figure for diskutilisation and give you an idea if the disks are the bottleneck. You have notgiven any indication to the number and type of disks installed in the system soits difficult to tell if this is the bottleneck.

Finally, you have the "stats" command. This is used to collectdata from counters built into the code and provides information from individualdisks, volumes, luns etc. This can be used to determine where the performancebottleneck is.

Let me know if you have any more questions.

Cheers

Steve

jakob_bena
6,887 Views

what kind of disk are in use?

how many vmware machine are running on this system?

are there any tools running like sanpmirror or snapvault?

when you start your deduplication?

in the priv set diag mode you can take a look on your cpu performance with this cmd "sysstat -M 1"

NETDIREKT
6,887 Views

Hello,


Thank you for replies.

I think my problem started when i upgraded to 7.3.6.

We are using SAS 15K disks.

There is no vmware on system. This is a CDN storage system. 5 servers are connected via NFS

heinowalther
6,887 Views

Depending of which version you upgraded from, there might be some filesystem operations involved after the upgrade, which will takeup some IOs.

But basically we need to know the aggregate configuration...  (aggr status -r)

Also if you have added disks to the aggregate you might need to do a reallocate on the volumes to spread out the volume on all the disks...

You can messure if this will make sense with the (reallocate messure) command...

Also a (sysconfig -u 1 or even sysconfg -x 1) migth show some more info as it shows the CPs...  it might be you are writing alot of small IOs to the filer...

/Heino

NETDIREKT
6,887 Views

Hello,

aggr status -r

Aggregate aggr0 (online, raid4) (block checksums)

  Plex /aggr0/plex0 (online, normal, active, pool0)

    RAID group /aggr0/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------

      parity    0c.00.3         0c    0   3   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.7         0c    0   7   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.2         0c    0   2   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.11        0c    0   11  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.6         0c    0   6   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

      data      0c.00.8         0c    0   8   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.10        0c    0   10  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

      data      0c.00.4         0c    0   4   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.1         0c    0   1   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.9         0c    0   9   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

      data      0c.00.5         0c    0   5   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

  Pool1 spare disks (empty)

  Pool0 spare disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

---------       ------          ------------- ---- ---- ---- ----- --------------    --------------

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0c.00.0         0c    0   0   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

sysstat -u 1

CPU   Total    Net kB/s    Disk kB/s    Tape kB/s Cache Cache  CP  CP Disk

       ops/s    in   out   read  write  read write   age   hit time ty util

99%    2656 24611 41014  54957  45706     0     0    14s  75%  90%  F  81%

100%    2572 29682 41593  71038  14782     0     0    12s  70%  51%  F  75%

99%    3058 29519 44912  64488  25407     0     0     2s  70%  75%  :  78%

99%    2257 20982 40047  59549  33126     0     0     2s  75%  93%  F  76%

98%    2374 20012 40556  63405  11652     0     0     2s  71%  57%  F  75%

100%    2671 21475 41520  67612  25684     0     0     2s  79%  65%  :  95%

100%    2494 15177 39469  64092  24460     0     0     2s  77%  68%  F  91%

sysstat -x 1

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

98%  2318     0     0    2318 13895 41763  59264  33904     0     0     2s  72%  82%  Ff  81%      0     0     0     0     0     0

91%  3137     0     0    3137 21804 49027  71080   4504     0     0     2s  72%  25%  :   71%      0     0     0     0     0     0

99%  2715     0     0    2715 19775 42026  65614  21151     0     0     2s  71%  80%  Ff  79%      0     0     0     0     0     0

100%  2918     0     0    2918 23577 48491  71632  17542     0     0    18s  75%  83%  Fs  84%      0     0     0     0     0     0

95%  2105     0     0    2105  8759 46881  69399  30585     0     0     2s  79% 100%  :f  99%      0     0     0     0     0     0

100%  2677     0     0    2677 16101 52417  73874     87     0     0     2s  69%  16%  :   85%      0     0     0     0     0     0

100%  2636     0     0    2636 20345 49581  77934  11960     0     0     2s  71%  49%  Fs  79%      0     0     0     0     0     0

Also we upgraded from 7.3.2 to 7.3.6 3 days ago.

Best Regars

radek_kubka
6,886 Views
Read Latency 8000msec

You mean 8000 microseconds? That translates to a mere 8 milliseconds, which can be deemed as good performance - normally anything below 20ms is good.

Regards,

Radek

NETDIREKT
6,886 Views

I know this is already a good value. my problem with high cpu usage.

jakob_bena
6,886 Views

have you some stats from your system, on status 7.3.2?

the only thing i see, is that your system have a high usage. net in/out, disk in/out.

have you cheked out "sysstat -M 1"?

regards

NETDIREKT
6,199 Views

Sorry but there is no "-M" option;

usage: sysstat [-c count] [-s] [-u | -x | -f | -i | -b] [interval]

-c count        - the number of iterations to execute

-s              - print out summary statistics when done

-u              - print out utilization format instead

-x              - print out all fields (overrides -u)

-f              - print out FCP target statistics

-i              - print out iSCSI target statistics

-b              - print out SAN statistics

I upgraded 5 of 5 storage to 7.3.6, so i dont have any 7.3.2 log.

jakob_bena
6,199 Views

switch in privileg mode "priv set diag" there you can execute this option.

NETDIREKT
6,199 Views

priv set diag

Warning: These diagnostic commands are for use by NetApp

         personnel only.

I changed priv mode but still there is no -M options. Any idea?

jakob_bena
6,199 Views

have you tried it?

NETDIREKT
6,199 Views

Ofcourse;

*> priv set diag

*> sysstat -M 1

usage: sysstat [-c count] [-s] [-u | -x | -f | -i | -b] [interval]

-c count        - the number of iterations to execute

-s              - print out summary statistics when done

-u              - print out utilization format instead

-x              - print out all fields (overrides -u)

-f              - print out FCP target statistics

-i              - print out iSCSI target statistics

-b              - print out SAN statistics

interval        - the interval between iterations in seconds, default is 15 seconds

jakob_bena
6,199 Views

i have a fas2040, my output looks like this

usage: sysstat [-c count] [-s] [-u | -x | -m | -f | -i | -b] [interval]

-c count        - the number of iterations to execute

-s              - print out summary statistics when done

-u              - print out utilization format instead

-x              - print out all fields (overrides -u)

-m              - print out multiprocessor statistics

-f              - print out FCP target statistics

-i              - print out iSCSI target statistics

-b              - print out SAN statistics

interval        - the interval between iterations in seconds, default is 15 seconds

filer2*> sysstat -M 1

ANY1+ ANY2+  AVG CPU0 CPU1 Network Storage Raid Target Kahuna WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s   CP

   0%    0%   0%   0%   0%      0%      0%   0%     0%     0%          0%        0%   0%     0%   0%   0%     0   0%

   2%    0%   1%   0%   2%      0%      0%   0%     0%     2%          0%        0%   0%     0%   0%   0%     0  10%

   0%    0%   0%   0%   0%      0%      0%   0%     0%     0%          0%        0%   0%     0%   0%   0%     0   0%

   0%    0%   0%   0%   0%      0%      0%   0%     0%     0%          0%        0%   0%     0%   0%   0%     0   0%

columbus_admin
4,906 Views

The 2020 is a single CPU/single core, so there is no "-M" option.  The 2040 is a single CPU/dual core, so "-M" is viable there.

Check that none of your NFS hosts are accessing the filer over the e0M management interface.  A single GbE host connecting to that interface which is 10/100, causes TCP buffer offloading to the CPU.

Also check to make sure that none of your hosts are using the mount options actimeo=0 or noac, this causes a full attribute return everytime a host looks, touchs, writes, read, modifies a file.  The 2020 would be hard pressed, even with only five hosts to run with that mount option set.  These are used mainly for databases...again the 2020 is not a high end storage system and not capable of serving data with these requirements outside of a couple of machines.

- Scott

NETDIREKT
4,906 Views

None of our NFS hosts are accesin vie e0M. All of them connects via gbit.

cat /etc/fstab

# Device                Mountpoint      FStype  Options         Dump    Pass#

/dev/ada0s1b            none            swap    sw              0       0

/dev/ada0s1a            /               ufs     rw              1       1

/dev/acd0               /cdrom          cd9660  ro,noauto       0       0

10.50.10.10:/www        /var/www        nfs     rw,tcp,async,noatime,nfsv3,wsize=65536,rsize=65536 0 0

10.50.10.10:/vol/vol3        /storage        nfs     rw,tcp,async,noatime,nfsv3,wsize=65536,rsize=65536 0 0

rajdeepsengupta
4,906 Views

We had a similar problem with our 3020 system, though not due to ontap upgrade, but it just started one day.

In our case the issue was with  network domain being busy, but I wonder why it is happening across all sites in your case. Since your storage does not support sysstat -M(capital), so it is difficult to debug the cause of CPU busy, the only way out is, you send the perfstat to Netapp support, to find out which process is keeping the CPU busy.

Public