Data Backup and Recovery
Data Backup and Recovery
Many years ago we discovered the hard way that if a VM has a VMware snapshot on it, the NetApp snapshot for the datastore that VM resides on will not include that VM in the backup. This state will persist until the VMware snapshot is removed.
We are now in a situation where someone unfortunately left a VMware snapshot in place during the entirety of our NetApp snapshot retention period, and needed to revert to the VMware snapshot. And worse, the VMware snapshot revert didn't go well. We ended up finding a way to clean up the restored VM but this all got me to thinking.
Maybe things have changed? I noticed the SnapCenter wizard actually includes the VM in it's restore options, just like any other VM. Has the limitation been removed? Can restores be done with any VM, regardless of whether there's a VMware snapshot attached or not? Can anyone provide insight on this topic? Thanks in advance!
I am not sure why you had issues in the past. A NetApp volume snapshot is a point in time copy of the volume. All the blocks that are in use are locked. It does not matter if the snapshot was created with an application or a scheduled snapshot. While there could be issues with backup applications. The snapshot will contain any VM in a crash consistent state. We can recover a VM from a volume manually. Please let me know if you have questions. Below are KBs that will help with manual recovery.
How to manually restore a VMware ESX virtual machine from a snapshot volume/LUN
How to manually restore a VMware ESX virtual machine from an NFS volume/datastore snapshot
Thank you @NetApp_SR . I've done the manual restore process in the past when SnapCenter (or VSC previously) wouldn't work for some reason (ours is an NFSv3 environment). However I'm pretty sure we were explicitly told by NetApp Support back in 2014 - 2015 or so that the reason we couldn't recover a VM one particular time was the existence of a VMware snapshot on that VM. We established a policy as a result that we don't retain VMware snapshots beyond a couple of hours if we use them, but obviously that information got lost with some newer folks. I'm also pretty sure that I raised the question again a few years later and it was confirmed that this was still the case. Maybe it is not the case anymore? Or maybe both parties previously missed something and this was not the issue all along??
Every technical issue has some particulars that make the incident unique so without the details I can only make general observations. VMware recommends not using a VMware snapshot for more than 72 hours, ref the KB 1025279 below. Recovering the files from a NetApp snapshot is normally not a problem. VMs with snapshots are just files on the disk. There certainly can be issues with a VM having corrupted files. Per VMware KB 1006585 “File corruption is random in nature.”. If a large VM snapshot was consolidating and stopped it can cause the VM to be corrupt and unusable. I have seen very large VMs that took days to consolidate so it can be tempting to attempt to stop the process. If there are issues with a VM once the files are recovered I would suggest a support case with VMware Support to address any issues. NetApp support engineers provide assistance to our customers for issues outside our products but we do not have some of the escalation resources a vendor will have for their own product.
Best practices for using VMware snapshots in the vSphere environment (1025279)
https://kb.vmware.com/s/article/1025279
Corrupt redo log causes errors within the virtual machine while powering on ESXi (1006585)
Interesting. Thanks @NetApp_SR. I'll keep researching.