ONTAP Hardware
ONTAP Hardware
We're dealing with this bug on our FAS3240 systems. Wanted to see who else has this issue and if you got it resolved. What did you do the fix it? How long did it take you?
Stale metadata not automatically removed as part of the 'sis start' operation on the volume when running Data ONTAP® 8.1x
KB ID: 7010056 Version: 13.0 Published date: 06/23/2014 Views: 8775
We have a few customers who were hit by that problem. You can see it if your "sis status -l" prints out incredibly huge numbers for "stale metadata" (in one case we had like 3500% stale metadata, but everything over 30% or so might indicate a problem)
If you're hit by that bug, I found that the only solution to 100% fix it is the following:
This has always fixed it for us. Note that we had a few cases with 8.1.2P4 where a simple "sis start -s" after the upgrade did not help; we had to do a "sis reset"
-Michael
Michael, thank you for the reply.
I believe we have it resolved now.
I upgraded our nodes to 8.2.1P1 early Wednesday morning and let dedup run on its schedule twice on the volumes afterwards (Wed. night and Thursday night) as stated in the KB.
After running the math again based on NetApp's formula we saw the stale fingerprints reduce. Some volumes were in the 500 - 900% range and are now down to 8 - 20% range.
Yes, indeed. A managed services client had a LUN go offline because of this. Looking at the used storage, I couldn't figure out where it all was. One support case later and I find out this is the problem. The volume had about 700GB of stale metadata which probably took 8 hours or so to reduce. We found the bug after we upgraded to 8.2.1 from a buggy 8.1.2 release. It just so happened the volume filled up the morning after the upgrade. Awesome timing.
-----------------------------------------
Please consider marking this answer "correct" or "helpful" if you found it useful
Mike Brown
VMware, Cisco Data Center, and NetApp dude
Consulting Engineer
Twitter: @VIRTUALLYMIKEB
Blog: http://VirtuallyMikeBrown.com
LinkedIn: http://LinkedIn.com/in/michaelbbrown
To add another tidbit to this.
We originally engaged NetApp performance, because when we tried to commit VMware snapshots with memory snapshot it would completely paralyze our controllers. Now that we have upgraded and ran dedup twice to remove the stale fingerprints our issue is not more. We're able to commit the snapshots now and the controllers keep rolling.
Not sure if there was another bug or the stale fingerprints. Either way, not fun.
Hi can someone email me a copy of this article to mancusomjm@gmail.com I cant see it
We too have this problem at my work and the bug is effecting all versions of ONTAP
http://mysupport.netapp.com/NOW/cgi-bin/bugrellist?bugno=931439
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365503(v=vs.85).aspx
No ETA from NetApp as to when we can expect the fix
Just repeating what NetApp support advised:
"
I have found our answer in bug 931439. It is new and very much still under investigation but Ill outline whats going on. When you do a copy in this manor Windows tries to create a SIS link instead of rehydrating the file immediately.
We can see that this is the case from the trace in packet 1842.
.... .... .... .... .... ..1. .... .... = Sparse: A SPARSE file
.... .... .... .... .... .1.. .... .... = Reparse Point: Has an associated REPARSE POINT
Here is more on reparse points.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365503(v=vs.85).aspx
"
I questioned this with the respective person and was advised to follow the bug and read the article.
I too questioned that but was told to read the above link.