ONTAP Hardware

High CPU: wafl_exempt

mdvillanueva
15,001 Views

Hi,

I am monitoring what is causing and I am seeing the"domain_busy:wafl_exempt" shoots up as the CPU utilization shoots up. Anyone know what this wafl_exempt is pertaining to?

Thanks,

Maico

16 REPLIES 16

AGUMADAVALLI
14,913 Views

Hi there,

Multi-processor safe wafl_exempt threads  can run in parallel on multiple processors. These threads handle WAFL tasks such as writes, reads, readdir. AFAIK, the purpose is to increase WAFL performance.

Case 1:

If the Filer acts as a snapmirror destination, I assume that it is busy running the Deswizzler after a snapmirror upgrade.

Has something changed to the snapmirror configuration?

But as you said it, a busy CPU is not a problem per se. The Filer is running these system background jobs (like the deswizzler) while no User workload has to be done.

If user workload is introduced into the system, Data ONTAP will throttle down these system jobs to focus on this protocol/user workload.

Case 2:

The Ontap 8.1 upgrade alters how the CPU are utilised on the Filer meaning the CPU monitoring characteristics changed after the 8.1 upgrade.

Deswizzling process taking place  after the upgrade (shown by a wafl scan status).

Looking at a "sysstat -m" on the Filer showed the actual CPU usage on each core was more even across them and considerably lower than the "CPU ANY" shown in the DFM graph.

Thank you,
AK G

mdvillanueva
14,913 Views

Hi AKG,

Nothing has changed to Snapmirror and I have none running at the moment, however we recently upgraded to 8.1.1. I get alerted of high latency (meaning passed the threshold I set) on some LUNs and volumes. The latency shoot up the same time as the CPU (I believe this is what you mean by CPU_any) shoot up. But when I check each of the four physical processors, I see the wafl_exempt went up.

So does wafl_exempt relates to non-system activity in the machine , which normally shows in the Kahuna domain?

AGUMADAVALLI
14,913 Views

Yes the upgrade will the one reason for this and it will not cause any issue with serving the data at all. I will disappear as, it progresses.

thank you,

AK G

AGUMADAVALLI
14,913 Views

it normalizes as the wafl adjustments are completed, which is due to upgrade process. It may take couple of days too.

thank you,

AK G

mdvillanueva
14,913 Views

Can it take more than a week? Because that’s how long it has been. It is doing the aggr scrubbing due to the rlw_upgrading.

aborzenkov
14,913 Views

It is exactly backwards ☺ rlw_upgrading is progressing as scrub runs. It does not cause scrub running by itself.

mdvillanueva
14,913 Views

yup, I guess that’s what I meant. I actually have to run the scrub daily until it completes.

PALEXOPOULOS
14,913 Views

AGUMADAVALLI has the correct answer.

You can finish your scrubs but it will make no difference to your High CPU.

We are experiencing the same issue where the console is almost unusable and a 5 min perfstat might take 20 mins to run. (this is a little worrying but apparently nothing to worry about)

Before we upgraded to 8.1 then 8.1.1 subsequently, we had the same snapmirror schedule, and never saw the filer with high cpu or deswizzling so much. You will notice if you turn snapmirror off the cpu will stabilise again to normal.

If the deswizzler finishes in time before the next mirror, the CPU will also drop right off.

"If user workload is introduced into the system, Data ONTAP will throttle down these system jobs to focus on this protocol/user workload." This is evident if you enter the priv set diag  and run the command wafl scan speed and you should see it throttle back...

It's annoying as DFM seems to spam us with CPU alerts all the time. We have not seen increase in latency but our workload is quite minimal atm. (purely CIFS).

Hope this helps, but don't expect once rlw_upgrading finishes to have your reported high cpu drop. (You should finish these scrubs)

uptimenow
14,913 Views

Hi Paul, do you have any more info on where you are with this issue right now ? I am seeing high WAFL_exempt CPU usage on a customer's 8.1.2 system. By no means as bad as your description, but still worryingly high ...

proby
12,223 Views

chad_petrie
12,223 Views

I am also curious know how people resolved this. After our 8.1.2 upgrade on a pair of v3270s we see increased CPU elevation, specifically the WAFL_exempt counters. DFM reports our filer consistently in the +80% CPU utilization, while Sysstat -m shows the procs bouncing between 45%-70%.

This is not a snapmirror destination so I do not see any deswizzle operations active. All I see are active bitmap rearrangements.

Any help on what to look at next is appreciated.

thomas_glodde
12,223 Views

be sure to upgrade to ontap 8.1.2P4 as there is a severe bug fixed when it comes to metroclusters or systems with a high space utilizuation.

uptimenow
12,223 Views

I am wondering: how many of you who are reporting high WAFL_Ex utilization are using PAM/FlashCache ?

I noticed on a 8.1.2-system with PAM and high hit ratios and lots of IOPS coming from the PAM cards that the WAFL_Ex domain utilization was very high. When I disabled the PAM card, WAFL_Ex utilization was noticeably lower (albeit at the cost of higher disk utilization) and far less spiky. Is there any scientific explanation to this, i.e. are PAM request served by calls to the WAFL_Ex domain ?

proby
12,223 Views

Some decrease in PAM/Flashcache caching percentage is expected immediately post-DOT upgrade (cache re-warming.)

By chance, are there flashpools configured and in use on the storage system?

chad_petrie
12,223 Views

I do  have PAM cards installed in this system. I do not have any flashpools.

On avg for a week I would say this system is replacing about 2000 reads with spikes to 3500. And those replaced read equate to about a 45-55% hit rate.

TOMASYAWORKS
10,233 Views

We have 8.2 installed and have also high wafl_exempt >150% We have PAM, which replaces 6000 reads. Before we had 8.01.

Public