Snapshots get stuck and require manual intervention

osp · ‎2018-03-29

Hi, I am an enduser, I do not manage Storage. I know we are on version ONTAP 9.1 P5.

We have a 3-day daily snapshot. So we have a snapmirror and the three 'daily' snapshots maintained. Snaps roll off after 3 days automatically.

We have situations where a volume will reach 100%.

Sometimes when this happens, the volume will get stuck. I was told internally that "the volume will not snapmirror sync anymore due to lack of space to create the update snapshot".

When this happens, snapshots stop working. snapmirror stops. Snapshots to not roll off automatically.

After a few days passes, the .snapshot subdirectory shows the snapmirror directory and 'daily' snapshot directories which are far > 3 days old.

The action we have to take, is that human intervention is required, space must be added to this volume to allow the snap process to operate.

My question is: Why was it designed like this? in other words, why must disk space be added? Why can't the existing snapshots which is (by this time) > 3 days, just rolloff automatically? This rolloff process will (very likely) release free space to the volume.

it seems odd to me that the system cannot self-manage itself, even when disk reaches 100% usage.

please help me understand this, thank you!!

kahuna · ‎2018-04-01

The storage controller cannot and should not decide what is more important for the user; snapshots, writing to the volume or snapmirror. That said, you have the option to tell it to either delete snapshots or to grow the volume, in case it is running out of space. See here:

space management

vol modify commands

osp · ‎2018-04-03

Thanks for the reply. You said "you have the option to tell it to either delete snapshots". Do you mean this can be fully automated? Or must this be done manually? Thank you again!

kahuna · ‎2018-04-03

automated. Once set, the system will attempt to delete snapshots to recover space upon reaching the capacity threshold. You can also specify whether to delete old snapshots first, or new

osp · ‎2018-04-03

Gotcha. Right, the reason I am asking this question is because the way our storage team has got it configured now, is 3 daily snapshots, where only 3 snaps retained. So nightly the oldest will auto-rolloff. As mentioned, some volumes reach 100% usage, and when nighttime arrives, the oldest snapshot does NOT auto-rolloff, so the volume gets "stuck". If no manual action is taken, nothing happens. Days can roll by, no new snaps are run at nighttime, and the 3 daily snapshots sitting there are (by this time) many days old. The volume will just be stuck there at 100% with no snap activity whatsoever.

It was confusing to me as an end user, because I expected the system would auto-rolloff the oldest snapshot. And by doing this, it is my understanding that the storage system may recover enough space by doing this task automatically, to resume normal operations and take regular nightly snapshots properly.

I am trying to understand this better to reduce the amount of manual intervention...

If I have any details incorrect, my apologies, and thank you again for all your help on this.

osp · ‎2018-04-04

The information from our storage team is that:

first the new snapshot must be taken,

only then the old snapshots can be auto-rolled off

I was really hoping the order could be reversed, but I'm told this cannot be configured like this.

osp · ‎2018-04-06

I suggest modification to Netapp in future release:

Allow user to configure Netapp such that when auto-snapshot process runs, system will first DELETE the oldest snapshot before attempting to generate new one.

This way there is some chance that the deletion of the very old snapshot could release some space to permit the normal operations of the volume to continue, when very close to max usage.

Please let me know your thoughts - thank you

osp · ‎2018-04-09

Please reply and let me know if this suggestion above makes sense. thanks!

osp · ‎2018-04-30

*bump*