Subscribe
Accepted Solution

CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

Hello Data Protectors,

 

 

I got the DFM alert saying that my CPU is too busy. "sysstat" output showed the CPU utilization between 90% to 96%. below is the "sysstat -M" output; as you can see the WAFL-Ex(Kahu) percentage crossed 100%. I heard that any domain that reaches 100% is a bottleneck, is that true? Can anyone tell what exactly this domain does? and is it the culprit in this case? if yes what is the solution?

 

 

ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 83% 25% 5% 1% 29% 25% 24% 25% 42% 5% 0% 0% 1% 1% 0% 1% 101%( 80%) 0% 0% 0% 3% 2% 1% 1097 3% 76% 33% 5% 1% 29% 24% 23% 26% 43% 5% 0% 0% 1% 2% 0% 1% 103%( 73%) 0% 0% 0% 2% 2% 1% 869 4% 90% 27% 4% 0% 31% 24% 24% 27% 50% 4% 0% 0% 1% 1% 0% 1% 112%( 88%) 0% 0% 0% 1% 2% 3% 741 7% 89% 18% 2% 0% 28% 18% 17% 21% 55% 3% 0% 0% 1% 1% 0% 2% 100%( 86%) 0% 0% 0% 2% 2% 1% 534 5% 87% 35% 10% 2% 34% 29% 28% 29% 50% 11% 0% 0% 2% 3% 0% 9% 96%( 74%) 0% 9% 0% 3% 3% 1% 1807 9% 99% 32% 12% 3% 37% 35% 32% 31% 52% 14% 0% 0% 1% 2% 0% 16% 108%( 82%) 1% 0% 0% 3% 4% 1% 3786 31% 100% 36% 12% 3% 38% 34% 33% 32% 54% 13% 0% 0% 1% 1% 0% 17% 115%( 82%) 0% 0% 0% 2% 4% 1% 3385 14% 99% 21% 7% 2% 33% 31% 26% 26% 48% 11% 0% 0% 1% 1% 0% 18% 93%( 80%) 0% 0% 0% 2% 3% 1% 3037 24% 92% 29% 7% 1% 33% 31% 30% 31% 40% 9% 0% 0% 1% 1% 0% 1% 115%( 89%) 0% 0% 0% 1% 3% 1% 1953 7% 87% 26% 3% 0% 29% 22% 23% 26% 47% 4% 0% 0% 1% 1% 0% 1% 108%( 85%) 0% 0% 0% 1% 2% 0% 703 3% 87% 28% 3% 0% 30% 24% 24% 27% 45% 4% 0% 0% 1% 1% 0% 1% 110%( 85%) 0% 0% 0% 1% 2% 0% 649 6% 68% 23% 3% 0% 24% 20% 19% 21% 37% 4% 0% 0% 1% 1% 0% 1% 86%( 65%) 0% 0% 0% 1% 2% 1% 816 3% 53% 26% 5% 1% 22% 20% 19% 21% 28% 9% 0% 0% 1% 1% 0% 2% 71%( 47%) 0% 0% 0% 1% 3% 1% 1959 6% 59% 19% 3% 0% 21% 16% 15% 17% 36% 5% 0% 0% 1% 1% 0% 1% 72%( 55%) 0% 0% 0% 1% 2% 1% 1027 4% 62% 19% 2% 0% 21% 16% 15% 17% 38% 5% 0% 0% 1% 1% 0% 2% 74%( 57%) 0% 0% 0% 1% 2% 1% 995 4% 55% 14% 2% 0% 19% 13% 13% 15% 34% 5% 0% 0% 1% 1% 0% 1% 63%( 51%) 0% 0% 0% 1% 2% 1% 980 5%

 

 

Thanks,
Charan

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

What version on Ontap and what version on OCUM?

 

Busy CPU isn't the correct threshold.. Should be looking at Average CPU.

 

Wafl KAHU is high probably b/c of dedupe or VSM or Deswizzle... 

 

what is sysstat -m 1

 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

[ Edited ]

ONTAP version 8.1 and DFM 3.1 we are using.

 

I checked snapmirro status and sis status; all are in Idle status.

 

 

sysstat -M 1 output:

 

ANY1+ ANY2+ ANY3+ ANY4+  AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host  Ops/s   CP
  99%    7%    1%    0%  27%  14%  16%  16%  62%      4%       0%      0%      1%   1%     0%     1%     99%( 98%)          0%        0%   0%     0%   2%   0%    661   0%
 100%    6%    1%    0%  28%  15%  15%  15%  66%      4%       0%      0%      1%   1%     0%     1%    100%( 99%)          0%        0%   0%     1%   2%   2%    462   0%
  97%   11%    2%    0%  28%  20%  17%  17%  58%      5%       0%      0%      1%   1%     0%     1%    101%( 96%)          0%        0%   0%     1%   2%   1%    659   0%
  92%   11%    2%    0%  27%  19%  20%  19%  50%      5%       0%      0%      1%   3%     0%     1%     95%( 90%)          0%        0%   0%     1%   2%   0%    817   0%
  93%   11%    1%    0%  27%  22%  17%  17%  52%      3%       0%      0%      1%   1%     0%     1%     99%( 92%)          0%        0%   0%     1%   2%   0%    533   0%
  80%    8%    2%    0%  23%  16%  13%  17%  45%      5%       0%      0%      1%   2%     0%     1%     78%( 76%)          0%        0%   0%     2%   2%   1%    946   0%
  90%   10%    2%    0%  26%  13%  11%  17%  62%      6%       0%      0%      1%   1%     0%     6%     86%( 83%)          0%        0%   0%     2%   2%   0%    638   0%
  84%    6%    1%    0%  23%  11%  11%  13%  58%      3%       0%      0%      1%   1%     0%     1%     85%( 82%)          0%        0%   0%     1%   2%   1%    504   0%
  74%   10%    2%    0%  22%  16%  14%  18%  41%      5%       0%      0%      2%   3%     0%     2%     73%( 70%)          1%        0%   0%     2%   2%   1%    931  38%
  91%    6%    1%    0%  25%  19%  15%  17%  49%      3%       0%      0%      1%   1%     0%     1%     91%( 89%)          0%        0%   0%     1%   2%   0%    504  18%
  89%    6%    1%    0%  25%  12%  13%  18%  54%      3%       0%      0%      1%   1%     0%     1%     88%( 87%)          0%        0%   0%     1%   2%   1%    649   0%

 

DFM

 Event Arguments
---------------
cpuBusyThresholdInterval: 00:15:00
cpuUtilization: 97.731
cpuTooBusyThreshold: 95

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

sysstat -m 1
ANY AVG CPU0 CPU1
100% 52% 4% 100%
95% 50% 5% 95%

 

Lowercase -m....

 

Also, time to upgrade .. 8.1 is VERY old, and DFM 3.1 is so ancient it's not even worth talking about.... 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

xxxxxxxxx> sysstat -m 1
 ANY  AVG  CPU0 CPU1 CPU2 CPU3
 97%  37%   32%  32%  31%  51%
 95%  29%   22%  19%  25%  49%
 96%  31%   22%  19%  22%  62%
 90%  25%   16%  16%  22%  45%
 98%  29%   20%  21%  20%  57%
 96%  32%   23%  23%  27%  55%
 89%  28%   23%  21%  21%  48%

 

 

 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

See my above points.

 

Look at average, your system is fine.. You need to upgrade both Ontap and your mgmt tools.

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

Thanks, will do that.

 

but curious to know that any domain utilization that is >= 100% is a bottle neck or not?

 

in this case what is causing WAFL-Ex to cross 100%? no VSM, SIS ans Deswizzling is running in my system.

 

 

 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

over 100% is not normally a bottleneck.

 

Since 8.1, processes have been well multi-threaded, so running over 100% just means its using more than one core

 

On our 6210 system (DOT8.1.2, 8 cores) for example, Wafl_EX hovers over 200% most the time - We used to get regular DFM alerts about high CPU, and a normal sysstat showed cpu pegged at 99%, but that counter is not a good indicator in the more modern multi-cpu systems.

(in later versions of DFM5 the high cpu alarm is actually disabled by default because of this)

 

As said above, the "avg" counter on sysstat -m  output is a better indicator as thats the average of all the cores. Unfortunately DFM does not "understand" this counter.

 

 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

[ Edited ]

Thanks Graham for the info.

 

FYI

 

Logical domain bottleneck: A logical domain reaches its concurrency limit. For example if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.

 

in my case it is not a bottleneck as WAFL-Ex is parlellized and concurnecy of 1+ CPU.

 

so I will be not getting these alerts if I upgrade DFM and ONTAP to latest version. Am I right?

 

and could you tell me that is there any document that has information about logical domains and its activities.

 

 

 

Re: CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

Hi.

 

Im sure i saw a post on here that had some information on the various logical domains.

 

Regarding DFM upgrade. if you dont want to upgrade theres a bugID with further info and a workaround to disable the alert:

 

http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=612203