ONTAP Hardware
ONTAP Hardware
Hi
We're in the process of migrating volumes to a different aggregate. Things were going smoothly once I started relying un-throttling but I was hit with the following error message yesterday:
wafl.memory.statusLowMemory: WAFL is running low on memory, with 8XXMB remaining.
I did some preliminary checking and I assume this is due to the volume migration being a heavy load on the PAM cards. Additionally the cluster home LIFs were not at home.
How serious of an issue is this and could it potentially lead to devastating consequences? Please advise.
Solved! See The Solution
at a minimum you should be on 9.3p7 to avoid this bug . It affects various processes.
https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1118890
In some cases, entire user space memory pages become corrupted and cause memory failures.
If this is truly urgent, please open a case with our support team.
I fully intend to
However if anyone knows the cause I would appreciate an answer nonetheless
I second what Drew said.
Vol moves shouldn't affect the system like that, they are background processes. This could be a bug though.
How many volumes are moving without throttling?
You can only do a a few a time unless you want to run out of memory.
Try doing fewer moves. Wait until some finish. Do more.
I'd usually move 4 volumes without throttling during weekdays and a bit more (8-10) on the weekend.
I've been doing this for about 2 months now, so i'm wondering why this wasn't an issue before. Will going easier on the system resolve the issue?
Are you on 9.6 not most recent patch release (like GA, P1, P2)? We've seen some similar issues. If you haven't opened a case please do as a P1.
We're on 9.3P3 currently
We were in the middle of upgrading our systems before the pandemic hit. We plan on upgrading to 9.5P10 when we get our chance
And I've created a case with Netapp, yes
Very good. Case number? Feel free to PM me if you want.
No problem, the case number is #2008299487
I'd like to ask a question; what happens if I run out of WAFL memory?
Would it cause the move jobs to fail or would the system directly be affected? And is it a known bug for ONTAP 9.3P3? I can't seem to find it in the documentation anywhere
Yes. https://mysupport.netapp.com/NOW/cgi-bin/relcmp.on?notfirst=Go%21&rels=9.3P3%2C9.3P17&what=fix ctrl+f for memory for examples.
To confirm, Support's Core Analysis team would need a core file panic. If you run out of memory the system will panic.
It might be helpful to get some performance data if possible. 9.3 may have some of the counters so a perf archive would be good to review during the time of the issue.
Two more error messages have propped up over the weekend as I migrated more volumes.
The first was wafl.readdir.expired and the 2nd was wafl.cp.toolong. The first error message is a bug, according to Netapp. Corrective action seems to come down to restructuring the directory, which simply isn't feasible. I assume upgrading will resolve this issue
The 2nd error message is directly related to my migration attempts as well, and there doesn't appear to be anything I can do to address that one. wafl seems to be a consistent culprit, it seems.
I still have a case open with Netapp, but i'm hoping that anyone here can shed light on the issue
at a minimum you should be on 9.3p7 to avoid this bug . It affects various processes.
https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1118890
In some cases, entire user space memory pages become corrupted and cause memory failures.
Yikes, that sounds terrifying
Thanks for alerting me; unfortunately, updating won't be feasible due to the pandemic. How risky is it for me to remain in 9.3P3?
The readdir expired can still be experienced on newer versions. 9.5 has some readdir optimizations.