Data Backup and Recovery
Data Backup and Recovery
Hi all,
I just thought I would provide a document that I created to fill the gap between SnapProtect, DFM and the Filers. One of the biggest annoyances in my environment is when a VM is moved to a different datastore or deleted when it’s part of the SnapProtect VSA backup job.
Background for what causes the issue and why:
The way SnapProtect and DFM interact; Each Backupset in SnapProtect is a directly related to a Dataset in DFM. The problem is that when you remove a VM from SnapProtect or it changes to a different datastore and that old datastore has no VM's part of the backupset the VM and the vault relationship is not cleaned up as you would expect. The VM and vault relationship continues to exist preventing the job aging process to complete. Sorry it’s hard to follow due to the 2 examples above. I might just stick to the VM deletion scenario and mention the other later in the post.
So we have a VM part of a backupset in SnapProtect called: "VM Backups" that job has a relationship to the DFM dataset:CC-colo-snap_SS-2997_SC-55.
Now you have a VM:Production-VM that has been backed up by the VSA and Vaulted to a secondary NetApp filer. The VM is on an NFS datastore call: NFS_VM08R2_01 and this is vaulted to the 2240 filer and that volume is called:SP_backup_colona1a_NFS_VM08R2_01_2.
Following the attached document; if you deleted the VM:Production-VM from VMWare you will need to as normal remove the VM from the backup job as it no longer exists but then you need to connect to the DFM server, find the backupset the VM belonged to, relinquish the DFM relationship, connect to the DFM interface, edit the dataset and remove the VM from the Physical resource and the backup resource, now connect to the backup filer and stop the vault relationship, delete the snapshot on the primary filer and manually delete the snapshots from the Array Manager in
SnapProtect.
Regarding when a VM move to a different datastore: When no VM's part of the SnapProtect backupset exist on a datastore that used to house a VM that was part of a backupset the snapshots are not updated and cannot be aged due to the vault relationship. When this snapshot cannot be aged any jobs on the other datastores also cannot be aged and will be locked until manually clean-up using this process.
I hope this helps some people since it has been a very frustrating implementation
Thanks,
Mike
Do anyone have a similar guide to OCUM 6.1?
I removed a volume from the SnapProtect backup and deleted all SnapProtect snapshots on primary and secondary volume and ran data aging. Then I removed the relationship and destroyed the secondary volume. When I try to add do a SnapProtect backup of the primary volume I get this error message.
How do you run the "storage-service-cleanup" and "storage-service-conform" APIs from SnapProtect?
Protection Job Failed. Reason: Managed volume XYZ of subscribed volume XYZ is missing on node XYZ
Possible reasons include:
The managed volume is explicitly deleted by the user, not through a partner application invoked storage-service API call. To rectify this condition use a partner application UI to invoke the storage-service-cleanup API and the storage-service-conform APIs, first to remove this volume node member from this storage service and then to create a new protection volume as part of the storage service.
If you're using the same storage policy please open a case on the commserve serial number.
I haven't seen this error before.
Is the error from SnapProtect or from DFM. Based on the description it sounds like its from DFM.
If you look in DFM can you see any external relationships?
If you SSH to the secondary "Backup" filer run snapvault status and see if you can see the vault relationship. If so you will need to stop the vault realationship.
OCUM 6.x doesn't provide a user interface for manually managing the protection relationships. I have wondered if it's possible to use the CLI if you SSH into the OCUM 6.x virtual appliance, but I haven't tried it.
According to a recent SnapProtect best practices session at NetApp Insight 2014, NetApp recommend:
1. Select a datastore as the sub-client object (not a VM).
2. 1 datastore = 1 volume
3. Have a 1:1 relationship between SubClients and Storage Policies.
If necessary you can use VM filters to stop particular VMs in a dataastore from being backed up (of course they'll still be included in the snapshot, but not in any SnapProtect backup copies, and VMware won't do an ESX snapshot for them)
If you do no. 3 above, you can delete the storage policy and SnapProtect / OCUM *should* clean up (remove) the volumes and relationships. There seems to be a cleanup process that runs periodically in OCUM to do this, but I don't know how it is scheduled or triggered. It doesn't run when data ageing runs. I have seen it take up to 2.5 days to destroy the protection relationship and associated volumes after the storage policy was deleted.
Where I needed to remove a volume from an existing protection relationship that contained multiple volumes, I have sometimes just deleted the volumes manually. OCUM logs an error but the other volumes in the relationship seem to continue to work. I don't know if this is a good idea or not, but so far it hasn't caused me any problems.
Hopefully they post the session from Insight. I would be interested to see how they approach different environments.
Regarding the Subclient per volume idea and not adding VM's to the subclient; what about VM's with VMDK's across different filers SSD,SAS,SATA or even different datastores? If you use the volume approach then you may not have consistent restores and will also be taking multiple VM snapshots for the different subclient schedules.
I am at the point of considering this configuration but also need to ensure I’m not adding any complexities or adding load to the VM's
Also when dumping to tape I wonder how it will go trying to register the VM when the .vmx is on a different datastore. (Part of the restore or dump to tape registers the VM with _GX appended to the VM name. I wonder how this would work when you attempt to restore one of the additional disks not stored with the .vmx
Thanks,
Mike
@MJBROWN_COM_AU wrote:Hopefully they post the session from Insight. I would be interested to see how they approach different environments.
Regarding the Subclient per volume idea and not adding VM's to the subclient; what about VM's with VMDK's across different filers SSD,SAS,SATA or even different datastores? If you use the volume approach then you may not have consistent restores and will also be taking multiple VM snapshots for the different subclient schedules.
I am at the point of considering this configuration but also need to ensure I’m not adding any complexities or adding load to the VM'sAlso when dumping to tape I wonder how it will go trying to register the VM when the .vmx is on a different datastore. (Part of the restore or dump to tape registers the VM with _GX appended to the VM name. I wonder how this would work when you attempt to restore one of the additional disks not stored with the .vmx
Thanks,
Mike
Insight session slides are available for attendees and partners at https://www.brainshark.com/go/netapp-sell/insight-library.html?cf=6729&c=5 -- search for SnapProtect Best Practices. They'll be available to customers sometime in mid-December.
If you use a Datastore as the backup target (contents) for the Subclient - rather than individual VMs - and the VM has VMDKs in multiple datastores, then the backup job will identify all of the volumes on which the VM's VMDKs reside, and create ONTAP snapshots on all of them. The VM backup will be consistent because of the vSphere-level snapshots (i.e. at the VMDK level). When (if?) you subsequently run SnapVault and SnapMirror copies in your Storage Policy, you'll get a new set of SM/SV target volumes on your secondary storage system for each of the subclients.
Because each new subclient results in a new set of mirrors for the volumes that it references, the recommendation is to have one subclient per volume if at all possible. If your VMs span different datastores, it will still work; but you'll get multiple mirror volumes, i.e. one new volume mirror for each sublient that contains a VM which has storage on that volume
You can appreciate that if you have a lot of VMs using multiple, shared datastores; and many different subclients that end up using the same volumes, then you can end up with a lot of mirror volumes.
For the restores and dumps to tape, I think it's intelligent enough to be able to find all of the required volumes/snapshots for the VM, and mount those into ESX for the duration of the restore or dump.