Hi All,
I must be having a bad day or something, but can't get my head around this today. I've got a FAS3240 running at close 100% on the cpu_busy counter. Latency looks fine, so whatever it is it's not causing a performance issue, but it is generating CPU alerts on DFM.
I have a number of NDMP tape to tape operations running which seem to be generating this as the vmware over NFS load is relatively light (1000-2000 NFS ops/sec).
So, sysstat 1 looks like this:
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache
in out read write read write age
92% 1165 0 0 15273 4320 31932 47209 145214 145214 51s
98% 2662 0 0 7119 1015 14508 55220 149094 149094 51s
99% 1251 0 0 13042 5001 3756 51936 151912 151978 51s
99% 650 0 0 5901 671 456 7308 155058 154993 51s
99% 1190 0 0 8936 757 296 8 154927 154927 51s
99% 2134 0 0 5885 899 256 0 154206 154206 51s
It doesn't appear to be a CPU core that's at 98% (sysstat -m)...
ANY AVG CPU0 CPU1 CPU2 CPU3
100% 58% 58% 57% 57% 58%
100% 54% 54% 54% 53% 55%
100% 57% 57% 56% 56% 58%
100% 55% 56% 55% 54% 56%
100% 77% 78% 77% 77% 77%
...so I assume it's a cpu domain that's maxed out. Looking at statit output, I can't see any particular domain that's at high utilization:
NetApp Release 8.1.2 7-Mode: Tue Oct 30 19:56:51 PDT 2012
<1O>
Start time: Wed Mar 27 09:32:55 GMT 2013
CPU Statistics
14.141772 time (seconds) 100 %
28.341048 system time 200 %
0.611241 rupt time 4 % (211010 rupts x 3 usec/rupt)
27.729807 non-rupt system time 196 %
28.226036 idle time 200 %
2.514662 time in CP 18 % 100 %
0.104331 rupt time in CP 4 % (37134 rupts x 3 usec/rupt)
Multiprocessor Statistics (per second)
cpu0 cpu1 cpu2 cpu3 total
sk switches 161727.33 163272.40 163634.80 165143.94 653778.47
hard switches 78693.46 79794.03 80383.21 83490.46 322361.16
domain switches 45627.52 45924.02 46501.74 49155.37 187208.65
CP rupts 730.25 355.47 406.10 1134.02 2625.84
nonCP rupts 3352.41 1643.64 1731.04 5568.11 12295.21
IPI rupts 0.00 0.00 0.00 0.00 0.00
grab kahuna 0.00 0.00 0.00 0.00 0.00
grab kahuna usec 0.00 0.00 0.00 0.00 0.00
CP rupt usec 4324.63 127.00 509.55 2416.25 7377.51
nonCP rupt usec 20245.62 579.42 2214.01 12805.68 35844.80
idle 502422.54 502765.85 503554.86 487190.15 1995933.54
kahuna 12694.87 12387.56 12685.04 13127.49 50895.11
storage 28276.80 29771.09 30174.72 26242.11 114464.79
exempt 120301.76 122678.12 121024.58 124690.03 488694.63
raid 4533.66 4430.14 4064.55 4970.66 17999.16
target 626.23 744.81 609.47 849.12 2829.77
dnscache 0.00 0.00 0.00 0.00 0.00
cifs 26.87 46.10 55.79 65.20 194.04
wafl_exempt 33716.85 30728.04 29905.73 28599.32 122950.08
wafl_xcleaner 2258.34 2442.83 2412.92 1487.15 8601.33
sm_exempt 13.15 19.23 17.47 20.79 70.78
cluster 0.00 0.00 0.00 0.00 0.00
protocol 34.93 33.16 50.63 36.63 155.43
nwk_exclusive 629.98 857.46 483.53 895.08 2866.26
nwk_exempt 34561.79 51149.32 49054.67 51691.61 186457.47
nwk_legacy 222720.96 227699.82 229450.67 244670.12 924541.78
hostOS 12610.30 13539.39 13731.16 241.98 40122.98
13.862242 seconds with one or more CPUs active ( 98%)
9.259903 seconds with 2 or more CPUs active ( 65%)
3.400337 seconds with 3 or more CPUs active ( 24%)
4.602338 seconds with one CPU active ( 33%)
5.859565 seconds with 2 CPUs active ( 41%)
2.451869 seconds with 3 CPUs active ( 17%)
0.948468 seconds with all CPUs active ( 7%)
Domain Utilization of Shared Domains (per second)
0.00 idle 106174.04 kahuna
0.00 storage 0.00 exempt
0.00 raid 0.00 target
0.00 dnscache 0.00 cifs
0.00 wafl_exempt 0.00 wafl_xcleaner
0.00 sm_exempt 0.00 cluster
0.00 protocol 956506.30 nwk_exclusive
0.00 nwk_exempt 0.00 nwk_legacy
0.00 hostOS
Can anyone see what I'm missing????
Thanks,
Craig