2016-12-22 09:26 AM - edited 2016-12-23 04:21 AM
We have just completed a "Copy Free" transition from 7-mode to C-dot onTap 9.0 using the 7 mode Transition Tool (7MTT) . We did the Copy Free transition by moving the shelves from a FAS3220 to a FAS8040.
The upgrade went well, but on completion, our home drives volume (18TB/50million files), is seeing huge snapshot deltas up to 10 times the size they were before. As a result they are rapidly filling up the volume. The Filer generates these changes at certain points during the day. We can see the snapshot suddenly surge in size from a few hundred MB to over ~200GB in just an hour. Its definitely a Filer process, as it is happening once or twice every day, and no users have quotas of more than 10GB. In fact most users aren't here due to the Xmas holidays.
There is no storage efficiency enabled on this volume, so we know this isn't the cause, and the volume belongs to an SVM, in case that is relevant.
We have a case open with support, via our support partner, but we've not had any response yet, and given this is our primary Home Drives volume, and these huge snapshots are happening every day, we are very quickly going to be in a sticky situation.
I'm just wondering if anyone has seen anything like this, or can give us some clues on where to look.
I ran a wafl scan status whilst the problem was happening, but there was nothing showing there.
The only other thing that was identified from our Autosupports as a possible cause was something called an L2I scan, but neither we, not our support partner know what this is.
2016-12-22 03:57 PM - edited 2016-12-22 04:02 PM
Did the workflow delete the aggregate snapshot created by CFT?
You mean the 7MTT workflow? The work was carried out by our engineer from our support partner on Saturday, so I'm not sure, how can i tell?
I'm also totally new to cdot which doesn't helping, so just trying to get my head around the commands to list the aggr snapshots, this can only be done by the CLI, is that right? If so, this looks to be the command?
node run -node filername -command snap list -A
which returns the following for the aggr in question, Is this what you were looking to check?
%/used %/total date name
---------- ---------- ------------ --------
1% ( 1%) 1% ( 1%) Dec 22 19:00 hourly.0
2% ( 1%) 1% ( 1%) Dec 22 14:00 hourly.1
2% ( 0%) 1% ( 0%) Dec 22 09:00 hourly.2
2% ( 0%) 1% ( 0%) Dec 22 00:00 nightly.0
3% ( 1%) 2% ( 0%) Dec 21 19:00 hourly.3
2016-12-22 04:21 PM
I don't see the CFT snapshot so the workflow was completed. I would turn off snapshots "node run nodename aggr options aggrname nosnap on" then delete the aggr snaps "node run nodename snap delete aggrname -A -a -f"
2016-12-23 04:13 AM
I can do that, although i'd like to understand your thoughts as to what might be happening to suggest this?
What would cause aggr snapshots to affect volume snapshots in this manner?
2017-05-17 09:07 AM
In case anyone else comes across this thread, after a fair amount of back and forth, Netapp have acknowledged this looks like a bug in OnTAP.
Our testing demonstrated that files created and then deleted "in between" snapshots, are not cleaned up properly and associated block changes still go into snapshots, when ordinarily they shouldn't do. They don't go into snapshots immediately either, they only appear in the snapshot some time later when the filer performs housekeeping routines, hence why this was initially so difficult to track what was going on. In the end we resorted to an empty volume and a couple of large files to test with.
In our case we originally spotted this issue on our home drives volume immediately after upgrading to Cluster mode. There is a lot of activity on this volume and applications such as office are creating and deleting temporary files continuously, hence why our daily snapshots increased in size from 40-50GB to 300-400GB, now we know why.
We don't have much detail yet, but it look likely this would affect anyone running cluster mode, in that your snapshot usage might be much larger than it should be.
The Bug ID is here if you want to watch it