2014-07-16 11:39 AM
We're dealing with this bug on our FAS3240 systems. Wanted to see who else has this issue and if you got it resolved. What did you do the fix it? How long did it take you?
Stale metadata not automatically removed as part of the 'sis start' operation on the volume when running Data ONTAP® 8.1x
KB ID: 7010056 Version: 13.0 Published date: 06/23/2014 Views: 8775
2014-07-25 04:57 AM
We have a few customers who were hit by that problem. You can see it if your "sis status -l" prints out incredibly huge numbers for "stale metadata" (in one case we had like 3500% stale metadata, but everything over 30% or so might indicate a problem)
If you're hit by that bug, I found that the only solution to 100% fix it is the following:
This has always fixed it for us. Note that we had a few cases with 8.1.2P4 where a simple "sis start -s" after the upgrade did not help; we had to do a "sis reset"
2014-07-25 09:11 AM
Michael, thank you for the reply.
I believe we have it resolved now.
I upgraded our nodes to 8.2.1P1 early Wednesday morning and let dedup run on its schedule twice on the volumes afterwards (Wed. night and Thursday night) as stated in the KB.
After running the math again based on NetApp's formula we saw the stale fingerprints reduce. Some volumes were in the 500 - 900% range and are now down to 8 - 20% range.
2014-07-28 09:25 PM
Yes, indeed. A managed services client had a LUN go offline because of this. Looking at the used storage, I couldn't figure out where it all was. One support case later and I find out this is the problem. The volume had about 700GB of stale metadata which probably took 8 hours or so to reduce. We found the bug after we upgraded to 8.2.1 from a buggy 8.1.2 release. It just so happened the volume filled up the morning after the upgrade. Awesome timing.
Please consider marking this answer "correct" or "helpful" if you found it useful
VMware, Cisco Data Center, and NetApp dude
2014-07-29 01:02 PM
To add another tidbit to this.
We originally engaged NetApp performance, because when we tried to commit VMware snapshots with memory snapshot it would completely paralyze our controllers. Now that we have upgraded and ran dedup twice to remove the stale fingerprints our issue is not more. We're able to commit the snapshots now and the controllers keep rolling.
Not sure if there was another bug or the stale fingerprints. Either way, not fun.
2015-08-09 08:18 PM - edited 2015-08-09 08:18 PM
We too have this problem at my work and the bug is effecting all versions of ONTAP
No ETA from NetApp as to when we can expect the fix
2015-08-09 10:19 PM
Just repeating what NetApp support advised:
I have found our answer in bug 931439. It is new and very much still under investigation but Ill outline whats going on. When you do a copy in this manor Windows tries to create a SIS link instead of rehydrating the file immediately.
We can see that this is the case from the trace in packet 1842.
.... .... .... .... .... ..1. .... .... = Sparse: A SPARSE file
.... .... .... .... .... .1.. .... .... = Reparse Point: Has an associated REPARSE POINT
Here is more on reparse points.
I questioned this with the respective person and was advised to follow the bug and read the article.
I too questioned that but was told to read the above link.
today I think we hit hard this bug, filling a 3.7TB aggregate with 1.2TB of A-SIS
Total space WAFL reserve Snap reserve Usable space BSR NVLOG A-SIS Smtape 3673GB 367GB 0KB 3306GB 0KB 1122GB 0KB Space allocated to volumes in the aggregate Volume Allocated Used Guarantee volroot 163GB 4167MB volume linux_virtual_machines 905GB 900GB none test_iscsi_vol 4487MB 4308MB none microsoft_virtual_machines 106GB 105GB none iscsi_linux_vms_vol 138GB 137GB none mail_fast 707GB 700GB none Aggregate Allocated Used Avail Total space 2027GB 1853GB 156GB Snap reserve 0KB 0KB 0KB WAFL reserve 367GB 39GB 327GB
Now we cleaned up a little bit and now we have 150GB free and the services running.
I would like to resolve the issue, and I'm asking myself if it's wise to:
- Disable the SIS avoiding a tonight disk filling:
sis off /vol/<volname>
for every volume that has the deduplication on
- When convenient (maybe in the weekend) run
sis start -s /vol/<volname>
- When convenient ask my specialist to upgrade the DataONTAP to a fixed version
Am I right?
PS: I'm also asking myself if there is a way to see how many stale fingerprints I have.
the percentages shown in sis status -l are not too high and it seems to me that they are wrong!