ONTAP Discussions

CPU domain WAFL-Ex(Kahu) utilization has crossed 100%


Hello Data Protectors,



I got the DFM alert saying that my CPU is too busy. "sysstat" output showed the CPU utilization between 90% to 96%. below is the "sysstat -M" output; as you can see the WAFL-Ex(Kahu) percentage crossed 100%. I heard that any domain that reaches 100% is a bottleneck, is that true? Can anyone tell what exactly this domain does? and is it the culprit in this case? if yes what is the solution?



ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 83% 25% 5% 1% 29% 25% 24% 25% 42% 5% 0% 0% 1% 1% 0% 1% 101%( 80%) 0% 0% 0% 3% 2% 1% 1097 3% 76% 33% 5% 1% 29% 24% 23% 26% 43% 5% 0% 0% 1% 2% 0% 1% 103%( 73%) 0% 0% 0% 2% 2% 1% 869 4% 90% 27% 4% 0% 31% 24% 24% 27% 50% 4% 0% 0% 1% 1% 0% 1% 112%( 88%) 0% 0% 0% 1% 2% 3% 741 7% 89% 18% 2% 0% 28% 18% 17% 21% 55% 3% 0% 0% 1% 1% 0% 2% 100%( 86%) 0% 0% 0% 2% 2% 1% 534 5% 87% 35% 10% 2% 34% 29% 28% 29% 50% 11% 0% 0% 2% 3% 0% 9% 96%( 74%) 0% 9% 0% 3% 3% 1% 1807 9% 99% 32% 12% 3% 37% 35% 32% 31% 52% 14% 0% 0% 1% 2% 0% 16% 108%( 82%) 1% 0% 0% 3% 4% 1% 3786 31% 100% 36% 12% 3% 38% 34% 33% 32% 54% 13% 0% 0% 1% 1% 0% 17% 115%( 82%) 0% 0% 0% 2% 4% 1% 3385 14% 99% 21% 7% 2% 33% 31% 26% 26% 48% 11% 0% 0% 1% 1% 0% 18% 93%( 80%) 0% 0% 0% 2% 3% 1% 3037 24% 92% 29% 7% 1% 33% 31% 30% 31% 40% 9% 0% 0% 1% 1% 0% 1% 115%( 89%) 0% 0% 0% 1% 3% 1% 1953 7% 87% 26% 3% 0% 29% 22% 23% 26% 47% 4% 0% 0% 1% 1% 0% 1% 108%( 85%) 0% 0% 0% 1% 2% 0% 703 3% 87% 28% 3% 0% 30% 24% 24% 27% 45% 4% 0% 0% 1% 1% 0% 1% 110%( 85%) 0% 0% 0% 1% 2% 0% 649 6% 68% 23% 3% 0% 24% 20% 19% 21% 37% 4% 0% 0% 1% 1% 0% 1% 86%( 65%) 0% 0% 0% 1% 2% 1% 816 3% 53% 26% 5% 1% 22% 20% 19% 21% 28% 9% 0% 0% 1% 1% 0% 2% 71%( 47%) 0% 0% 0% 1% 3% 1% 1959 6% 59% 19% 3% 0% 21% 16% 15% 17% 36% 5% 0% 0% 1% 1% 0% 1% 72%( 55%) 0% 0% 0% 1% 2% 1% 1027 4% 62% 19% 2% 0% 21% 16% 15% 17% 38% 5% 0% 0% 1% 1% 0% 2% 74%( 57%) 0% 0% 0% 1% 2% 1% 995 4% 55% 14% 2% 0% 19% 13% 13% 15% 34% 5% 0% 0% 1% 1% 0% 1% 63%( 51%) 0% 0% 0% 1% 2% 1% 980 5%








Im sure i saw a post on here that had some information on the various logical domains.


Regarding DFM upgrade. if you dont want to upgrade theres a bugID with further info and a workaround to disable the alert:




View solution in original post



What version on Ontap and what version on OCUM?


Busy CPU isn't the correct threshold.. Should be looking at Average CPU.


Wafl KAHU is high probably b/c of dedupe or VSM or Deswizzle... 


what is sysstat -m 1



ONTAP version 8.1 and DFM 3.1 we are using.


I checked snapmirro status and sis status; all are in Idle status.



sysstat -M 1 output:


ANY1+ ANY2+ ANY3+ ANY4+  AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host  Ops/s   CP
  99%    7%    1%    0%  27%  14%  16%  16%  62%      4%       0%      0%      1%   1%     0%     1%     99%( 98%)          0%        0%   0%     0%   2%   0%    661   0%
 100%    6%    1%    0%  28%  15%  15%  15%  66%      4%       0%      0%      1%   1%     0%     1%    100%( 99%)          0%        0%   0%     1%   2%   2%    462   0%
  97%   11%    2%    0%  28%  20%  17%  17%  58%      5%       0%      0%      1%   1%     0%     1%    101%( 96%)          0%        0%   0%     1%   2%   1%    659   0%
  92%   11%    2%    0%  27%  19%  20%  19%  50%      5%       0%      0%      1%   3%     0%     1%     95%( 90%)          0%        0%   0%     1%   2%   0%    817   0%
  93%   11%    1%    0%  27%  22%  17%  17%  52%      3%       0%      0%      1%   1%     0%     1%     99%( 92%)          0%        0%   0%     1%   2%   0%    533   0%
  80%    8%    2%    0%  23%  16%  13%  17%  45%      5%       0%      0%      1%   2%     0%     1%     78%( 76%)          0%        0%   0%     2%   2%   1%    946   0%
  90%   10%    2%    0%  26%  13%  11%  17%  62%      6%       0%      0%      1%   1%     0%     6%     86%( 83%)          0%        0%   0%     2%   2%   0%    638   0%
  84%    6%    1%    0%  23%  11%  11%  13%  58%      3%       0%      0%      1%   1%     0%     1%     85%( 82%)          0%        0%   0%     1%   2%   1%    504   0%
  74%   10%    2%    0%  22%  16%  14%  18%  41%      5%       0%      0%      2%   3%     0%     2%     73%( 70%)          1%        0%   0%     2%   2%   1%    931  38%
  91%    6%    1%    0%  25%  19%  15%  17%  49%      3%       0%      0%      1%   1%     0%     1%     91%( 89%)          0%        0%   0%     1%   2%   0%    504  18%
  89%    6%    1%    0%  25%  12%  13%  18%  54%      3%       0%      0%      1%   1%     0%     1%     88%( 87%)          0%        0%   0%     1%   2%   1%    649   0%



 Event Arguments
cpuBusyThresholdInterval: 00:15:00
cpuUtilization: 97.731
cpuTooBusyThreshold: 95


sysstat -m 1
100% 52% 4% 100%
95% 50% 5% 95%


Lowercase -m....


Also, time to upgrade .. 8.1 is VERY old, and DFM 3.1 is so ancient it's not even worth talking about.... 


xxxxxxxxx> sysstat -m 1
 97%  37%   32%  32%  31%  51%
 95%  29%   22%  19%  25%  49%
 96%  31%   22%  19%  22%  62%
 90%  25%   16%  16%  22%  45%
 98%  29%   20%  21%  20%  57%
 96%  32%   23%  23%  27%  55%
 89%  28%   23%  21%  21%  48%





See my above points.


Look at average, your system is fine.. You need to upgrade both Ontap and your mgmt tools.


Thanks, will do that.


but curious to know that any domain utilization that is >= 100% is a bottle neck or not?


in this case what is causing WAFL-Ex to cross 100%? no VSM, SIS ans Deswizzling is running in my system.





over 100% is not normally a bottleneck.


Since 8.1, processes have been well multi-threaded, so running over 100% just means its using more than one core


On our 6210 system (DOT8.1.2, 8 cores) for example, Wafl_EX hovers over 200% most the time - We used to get regular DFM alerts about high CPU, and a normal sysstat showed cpu pegged at 99%, but that counter is not a good indicator in the more modern multi-cpu systems.

(in later versions of DFM5 the high cpu alarm is actually disabled by default because of this)


As said above, the "avg" counter on sysstat -m  output is a better indicator as thats the average of all the cores. Unfortunately DFM does not "understand" this counter.




Thanks Graham for the info.




Logical domain bottleneck: A logical domain reaches its concurrency limit. For example if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.


in my case it is not a bottleneck as WAFL-Ex is parlellized and concurnecy of 1+ CPU.


so I will be not getting these alerts if I upgrade DFM and ONTAP to latest version. Am I right?


and could you tell me that is there any document that has information about logical domains and its activities.







Im sure i saw a post on here that had some information on the various logical domains.


Regarding DFM upgrade. if you dont want to upgrade theres a bugID with further info and a workaround to disable the alert:



