Subscribe

High CPU: wafl_exempt

Hi,

I am monitoring what is causing and I am seeing the"domain_busy:wafl_exempt" shoots up as the CPU utilization shoots up. Anyone know what this wafl_exempt is pertaining to?

Thanks,

Maico

Re: High CPU: wafl_exempt

Hi there,

Multi-processor safe wafl_exempt threads  can run in parallel on multiple processors. These threads handle WAFL tasks such as writes, reads, readdir. AFAIK, the purpose is to increase WAFL performance.

Case 1:

If the Filer acts as a snapmirror destination, I assume that it is busy running the Deswizzler after a snapmirror upgrade.

Has something changed to the snapmirror configuration?

But as you said it, a busy CPU is not a problem per se. The Filer is running these system background jobs (like the deswizzler) while no User workload has to be done.

If user workload is introduced into the system, Data ONTAP will throttle down these system jobs to focus on this protocol/user workload.

Case 2:

The Ontap 8.1 upgrade alters how the CPU are utilised on the Filer meaning the CPU monitoring characteristics changed after the 8.1 upgrade.

Deswizzling process taking place  after the upgrade (shown by a wafl scan status).

Looking at a "sysstat -m" on the Filer showed the actual CPU usage on each core was more even across them and considerably lower than the "CPU ANY" shown in the DFM graph.

Thank you,
AK G

Re: High CPU: wafl_exempt

Hi AKG,

Nothing has changed to Snapmirror and I have none running at the moment, however we recently upgraded to 8.1.1. I get alerted of high latency (meaning passed the threshold I set) on some LUNs and volumes. The latency shoot up the same time as the CPU (I believe this is what you mean by CPU_any) shoot up. But when I check each of the four physical processors, I see the wafl_exempt went up.

So does wafl_exempt relates to non-system activity in the machine , which normally shows in the Kahuna domain?

Re: High CPU: wafl_exempt

Yes the upgrade will the one reason for this and it will not cause any issue with serving the data at all. I will disappear as, it progresses.

thank you,

AK G

Re: High CPU: wafl_exempt

it normalizes as the wafl adjustments are completed, which is due to upgrade process. It may take couple of days too.

thank you,

AK G

Re: High CPU: wafl_exempt

Can it take more than a week? Because that’s how long it has been. It is doing the aggr scrubbing due to the rlw_upgrading.

Re: High CPU: wafl_exempt

It is exactly backwards ☺ rlw_upgrading is progressing as scrub runs. It does not cause scrub running by itself.

Re: High CPU: wafl_exempt

yup, I guess that’s what I meant. I actually have to run the scrub daily until it completes.

Re: High CPU: wafl_exempt

AGUMADAVALLI has the correct answer.

You can finish your scrubs but it will make no difference to your High CPU.

We are experiencing the same issue where the console is almost unusable and a 5 min perfstat might take 20 mins to run. (this is a little worrying but apparently nothing to worry about)

Before we upgraded to 8.1 then 8.1.1 subsequently, we had the same snapmirror schedule, and never saw the filer with high cpu or deswizzling so much. You will notice if you turn snapmirror off the cpu will stabilise again to normal.

If the deswizzler finishes in time before the next mirror, the CPU will also drop right off.

"If user workload is introduced into the system, Data ONTAP will throttle down these system jobs to focus on this protocol/user workload." This is evident if you enter the priv set diag  and run the command wafl scan speed and you should see it throttle back...

It's annoying as DFM seems to spam us with CPU alerts all the time. We have not seen increase in latency but our workload is quite minimal atm. (purely CIFS).

Hope this helps, but don't expect once rlw_upgrading finishes to have your reported high cpu drop. (You should finish these scrubs)

Re: High CPU: wafl_exempt

Hi Paul, do you have any more info on where you are with this issue right now ? I am seeing high WAFL_exempt CPU usage on a customer's 8.1.2 system. By no means as bad as your description, but still worryingly high ...