ONTAP Discussions

CPU domain WAFL-Ex(Kahu) utilization has crossed 100%

netappwala

Hello Data Protectors,

 

 

I got the DFM alert saying that my CPU is too busy. "sysstat" output showed the CPU utilization between 90% to 96%. below is the "sysstat -M" output; as you can see the WAFL-Ex(Kahu) percentage crossed 100%. I heard that any domain that reaches 100% is a bottleneck, is that true? Can anyone tell what exactly this domain does? and is it the culprit in this case? if yes what is the solution?

 

 

ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 83% 25% 5% 1% 29% 25% 24% 25% 42% 5% 0% 0% 1% 1% 0% 1% 101%( 80%) 0% 0% 0% 3% 2% 1% 1097 3% 76% 33% 5% 1% 29% 24% 23% 26% 43% 5% 0% 0% 1% 2% 0% 1% 103%( 73%) 0% 0% 0% 2% 2% 1% 869 4% 90% 27% 4% 0% 31% 24% 24% 27% 50% 4% 0% 0% 1% 1% 0% 1% 112%( 88%) 0% 0% 0% 1% 2% 3% 741 7% 89% 18% 2% 0% 28% 18% 17% 21% 55% 3% 0% 0% 1% 1% 0% 2% 100%( 86%) 0% 0% 0% 2% 2% 1% 534 5% 87% 35% 10% 2% 34% 29% 28% 29% 50% 11% 0% 0% 2% 3% 0% 9% 96%( 74%) 0% 9% 0% 3% 3% 1% 1807 9% 99% 32% 12% 3% 37% 35% 32% 31% 52% 14% 0% 0% 1% 2% 0% 16% 108%( 82%) 1% 0% 0% 3% 4% 1% 3786 31% 100% 36% 12% 3% 38% 34% 33% 32% 54% 13% 0% 0% 1% 1% 0% 17% 115%( 82%) 0% 0% 0% 2% 4% 1% 3385 14% 99% 21% 7% 2% 33% 31% 26% 26% 48% 11% 0% 0% 1% 1% 0% 18% 93%( 80%) 0% 0% 0% 2% 3% 1% 3037 24% 92% 29% 7% 1% 33% 31% 30% 31% 40% 9% 0% 0% 1% 1% 0% 1% 115%( 89%) 0% 0% 0% 1% 3% 1% 1953 7% 87% 26% 3% 0% 29% 22% 23% 26% 47% 4% 0% 0% 1% 1% 0% 1% 108%( 85%) 0% 0% 0% 1% 2% 0% 703 3% 87% 28% 3% 0% 30% 24% 24% 27% 45% 4% 0% 0% 1% 1% 0% 1% 110%( 85%) 0% 0% 0% 1% 2% 0% 649 6% 68% 23% 3% 0% 24% 20% 19% 21% 37% 4% 0% 0% 1% 1% 0% 1% 86%( 65%) 0% 0% 0% 1% 2% 1% 816 3% 53% 26% 5% 1% 22% 20% 19% 21% 28% 9% 0% 0% 1% 1% 0% 2% 71%( 47%) 0% 0% 0% 1% 3% 1% 1959 6% 59% 19% 3% 0% 21% 16% 15% 17% 36% 5% 0% 0% 1% 1% 0% 1% 72%( 55%) 0% 0% 0% 1% 2% 1% 1027 4% 62% 19% 2% 0% 21% 16% 15% 17% 38% 5% 0% 0% 1% 1% 0% 2% 74%( 57%) 0% 0% 0% 1% 2% 1% 995 4% 55% 14% 2% 0% 19% 13% 13% 15% 34% 5% 0% 0% 1% 1% 0% 1% 63%( 51%) 0% 0% 0% 1% 2% 1% 980 5%

 

 

Thanks,
Charan

1 ACCEPTED SOLUTION

colin_graham

Hi.

 

Im sure i saw a post on here that had some information on the various logical domains.

 

Regarding DFM upgrade. if you dont want to upgrade theres a bugID with further info and a workaround to disable the alert:

 

http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=612203

 

View solution in original post

9 REPLIES 9

JGPSHNTAP

What version on Ontap and what version on OCUM?

 

Busy CPU isn't the correct threshold.. Should be looking at Average CPU.

 

Wafl KAHU is high probably b/c of dedupe or VSM or Deswizzle... 

 

what is sysstat -m 1

 

netappwala

ONTAP version 8.1 and DFM 3.1 we are using.

 

I checked snapmirro status and sis status; all are in Idle status.

 

 

sysstat -M 1 output:

 

ANY1+ ANY2+ ANY3+ ANY4+  AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host  Ops/s   CP
  99%    7%    1%    0%  27%  14%  16%  16%  62%      4%       0%      0%      1%   1%     0%     1%     99%( 98%)          0%        0%   0%     0%   2%   0%    661   0%
 100%    6%    1%    0%  28%  15%  15%  15%  66%      4%       0%      0%      1%   1%     0%     1%    100%( 99%)          0%        0%   0%     1%   2%   2%    462   0%
  97%   11%    2%    0%  28%  20%  17%  17%  58%      5%       0%      0%      1%   1%     0%     1%    101%( 96%)          0%        0%   0%     1%   2%   1%    659   0%
  92%   11%    2%    0%  27%  19%  20%  19%  50%      5%       0%      0%      1%   3%     0%     1%     95%( 90%)          0%        0%   0%     1%   2%   0%    817   0%
  93%   11%    1%    0%  27%  22%  17%  17%  52%      3%       0%      0%      1%   1%     0%     1%     99%( 92%)          0%        0%   0%     1%   2%   0%    533   0%
  80%    8%    2%    0%  23%  16%  13%  17%  45%      5%       0%      0%      1%   2%     0%     1%     78%( 76%)          0%        0%   0%     2%   2%   1%    946   0%
  90%   10%    2%    0%  26%  13%  11%  17%  62%      6%       0%      0%      1%   1%     0%     6%     86%( 83%)          0%        0%   0%     2%   2%   0%    638   0%
  84%    6%    1%    0%  23%  11%  11%  13%  58%      3%       0%      0%      1%   1%     0%     1%     85%( 82%)          0%        0%   0%     1%   2%   1%    504   0%
  74%   10%    2%    0%  22%  16%  14%  18%  41%      5%       0%      0%      2%   3%     0%     2%     73%( 70%)          1%        0%   0%     2%   2%   1%    931  38%
  91%    6%    1%    0%  25%  19%  15%  17%  49%      3%       0%      0%      1%   1%     0%     1%     91%( 89%)          0%        0%   0%     1%   2%   0%    504  18%
  89%    6%    1%    0%  25%  12%  13%  18%  54%      3%       0%      0%      1%   1%     0%     1%     88%( 87%)          0%        0%   0%     1%   2%   1%    649   0%

 

DFM

 Event Arguments
---------------
cpuBusyThresholdInterval: 00:15:00
cpuUtilization: 97.731
cpuTooBusyThreshold: 95

JGPSHNTAP

sysstat -m 1
ANY AVG CPU0 CPU1
100% 52% 4% 100%
95% 50% 5% 95%

 

Lowercase -m....

 

Also, time to upgrade .. 8.1 is VERY old, and DFM 3.1 is so ancient it's not even worth talking about.... 

netappwala

xxxxxxxxx> sysstat -m 1
 ANY  AVG  CPU0 CPU1 CPU2 CPU3
 97%  37%   32%  32%  31%  51%
 95%  29%   22%  19%  25%  49%
 96%  31%   22%  19%  22%  62%
 90%  25%   16%  16%  22%  45%
 98%  29%   20%  21%  20%  57%
 96%  32%   23%  23%  27%  55%
 89%  28%   23%  21%  21%  48%

 

 

 

JGPSHNTAP

See my above points.

 

Look at average, your system is fine.. You need to upgrade both Ontap and your mgmt tools.

netappwala

Thanks, will do that.

 

but curious to know that any domain utilization that is >= 100% is a bottle neck or not?

 

in this case what is causing WAFL-Ex to cross 100%? no VSM, SIS ans Deswizzling is running in my system.

 

 

 

colin_graham

over 100% is not normally a bottleneck.

 

Since 8.1, processes have been well multi-threaded, so running over 100% just means its using more than one core

 

On our 6210 system (DOT8.1.2, 8 cores) for example, Wafl_EX hovers over 200% most the time - We used to get regular DFM alerts about high CPU, and a normal sysstat showed cpu pegged at 99%, but that counter is not a good indicator in the more modern multi-cpu systems.

(in later versions of DFM5 the high cpu alarm is actually disabled by default because of this)

 

As said above, the "avg" counter on sysstat -m  output is a better indicator as thats the average of all the cores. Unfortunately DFM does not "understand" this counter.

 

 

netappwala

Thanks Graham for the info.

 

FYI

 

Logical domain bottleneck: A logical domain reaches its concurrency limit. For example if a logical domain has a concurrency of 1 CPU core and it reaches 100% utilization.

 

in my case it is not a bottleneck as WAFL-Ex is parlellized and concurnecy of 1+ CPU.

 

so I will be not getting these alerts if I upgrade DFM and ONTAP to latest version. Am I right?

 

and could you tell me that is there any document that has information about logical domains and its activities.

 

 

 

colin_graham

Hi.

 

Im sure i saw a post on here that had some information on the various logical domains.

 

Regarding DFM upgrade. if you dont want to upgrade theres a bugID with further info and a workaround to disable the alert:

 

http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=612203

 

View solution in original post

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public