VMware Solutions Discussions

ONTAP 8.1.2 7-Mode cpu_busy counter

GARDINEC_EBRD
7,328 Views

Hi All,

I must be having a bad day or something, but can't get my head around this today.  I've got a FAS3240 running at close 100% on the cpu_busy counter.  Latency looks fine, so whatever it is it's not causing a performance issue, but it is generating CPU alerts on DFM.

I have a number of NDMP tape to tape operations running which seem to be generating this as the vmware over NFS load is relatively light (1000-2000 NFS ops/sec).

So, sysstat 1 looks like this:

  CPU     NFS    CIFS    HTTP     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache

                                   in    out    read  write    read  write    age

92%    1165       0       0   15273   4320   31932  47209  145214 145214    51s

98%    2662       0       0    7119   1015   14508  55220  149094 149094    51s

99%    1251       0       0   13042   5001    3756  51936  151912 151978    51s

99%     650       0       0    5901    671     456   7308  155058 154993    51s

99%    1190       0       0    8936    757     296      8  154927 154927    51s

99%    2134       0       0    5885    899     256      0  154206 154206    51s

It doesn't appear to be a CPU core that's at 98% (sysstat -m)...

ANY  AVG  CPU0 CPU1 CPU2 CPU3

100%  58%   58%  57%  57%  58%

100%  54%   54%  54%  53%  55%

100%  57%   57%  56%  56%  58%

100%  55%   56%  55%  54%  56%

100%  77%   78%  77%  77%  77%

...so I assume it's a cpu domain that's maxed out.  Looking at statit output, I can't see any particular domain that's at high utilization:

   NetApp Release 8.1.2 7-Mode: Tue Oct 30 19:56:51 PDT 2012

    <1O>

  Start time: Wed Mar 27 09:32:55 GMT 2013

                       CPU Statistics

      14.141772 time (seconds)       100 %

      28.341048 system time          200 %

       0.611241 rupt time              4 %   (211010 rupts x 3 usec/rupt)

      27.729807 non-rupt system time 196 %

      28.226036 idle time            200 %

       2.514662 time in CP            18 %   100 %

       0.104331 rupt time in CP                4 %   (37134 rupts x 3 usec/rupt)

                       Multiprocessor Statistics (per second)

                          cpu0       cpu1       cpu2       cpu3      total

sk switches          161727.33  163272.40  163634.80  165143.94  653778.47

hard switches         78693.46   79794.03   80383.21   83490.46  322361.16

domain switches       45627.52   45924.02   46501.74   49155.37  187208.65

CP rupts                730.25     355.47     406.10    1134.02    2625.84

nonCP rupts            3352.41    1643.64    1731.04    5568.11   12295.21

IPI rupts                 0.00       0.00       0.00       0.00       0.00

grab kahuna               0.00       0.00       0.00       0.00       0.00

grab kahuna usec          0.00       0.00       0.00       0.00       0.00

CP rupt usec           4324.63     127.00     509.55    2416.25    7377.51

nonCP rupt usec       20245.62     579.42    2214.01   12805.68   35844.80

idle                 502422.54  502765.85  503554.86  487190.15 1995933.54

kahuna                12694.87   12387.56   12685.04   13127.49   50895.11

storage               28276.80   29771.09   30174.72   26242.11  114464.79

exempt               120301.76  122678.12  121024.58  124690.03  488694.63

raid                   4533.66    4430.14    4064.55    4970.66   17999.16

target                  626.23     744.81     609.47     849.12    2829.77

dnscache                  0.00       0.00       0.00       0.00       0.00

cifs                     26.87      46.10      55.79      65.20     194.04

wafl_exempt           33716.85   30728.04   29905.73   28599.32  122950.08

wafl_xcleaner          2258.34    2442.83    2412.92    1487.15    8601.33

sm_exempt                13.15      19.23      17.47      20.79      70.78

cluster                   0.00       0.00       0.00       0.00       0.00

protocol                 34.93      33.16      50.63      36.63     155.43

nwk_exclusive           629.98     857.46     483.53     895.08    2866.26

nwk_exempt            34561.79   51149.32   49054.67   51691.61  186457.47

nwk_legacy           222720.96  227699.82  229450.67  244670.12  924541.78

hostOS                12610.30   13539.39   13731.16     241.98   40122.98

       13.862242 seconds with one or more CPUs active   ( 98%)

       9.259903 seconds with 2 or more CPUs active     ( 65%)

       3.400337 seconds with 3 or more CPUs active     ( 24%)

        4.602338 seconds with one CPU active            ( 33%)

       5.859565 seconds with 2 CPUs active             ( 41%)

       2.451869 seconds with 3 CPUs active             ( 17%)

       0.948468 seconds with all CPUs active           (  7%)

                        Domain Utilization of Shared Domains (per second)

      0.00 idle                         106174.04 kahuna

      0.00 storage                           0.00 exempt

      0.00 raid                              0.00 target

      0.00 dnscache                          0.00 cifs

      0.00 wafl_exempt                       0.00 wafl_xcleaner

      0.00 sm_exempt                         0.00 cluster

      0.00 protocol                     956506.30 nwk_exclusive

      0.00 nwk_exempt                        0.00 nwk_legacy

      0.00 hostOS

Can anyone see what I'm missing????

Thanks,

Craig

16 REPLIES 16
Public