2017-08-16 07:11 AM
We are using vSphere 6.0.0 with Virtual Storage Centre 6.2.1 and are encountering issues with snapshots taken via the VSC following a scheduled power recycle of the VM host on the 15th July.
We have a number of backup jobs configured on an hourly, daily and weekly schedule. The daily and weekly jobs are completing with no problems but the hourly one which contains VMs which have more than one virtual hard disk on separate datastores are failing with a quiescing error.
In the VSC, we have selected the Perform VMware consistency snapshot option for all of the backup jobs/schedules. We are seeing in vSphere that when the hourly job runs, creating a VMware snapshot is failing.
Creating snapshots directly from vSphere with the following options selected gives us the following results:
According to NetApp, having the Perform VMware consistency snapshot option ticked in VSC would be the same as having the following options in vSphere:
The results tie up but the logic deos not, I would have thought that ensuring the snapshot is consistent would mean having the memory included in the snapshot data?
I have read the following two articles on snapshots:
All VMs have VMware Tools version 10.1.5, build 5055683 installed. Following discussions with VMware and their disclosure of a known issue with this version of Tools, we reinstalled Tools on one VM and this has been working fine since. We reinstalled Tools on another VM but this made no difference. Creating a brand new VM with two disks on separate NetApp datastores and the same version of Tools also works fine. Snapshotting VMs which do not have their disks on NetApp storage also works.
Turning the Perform VMware consistency snapshot option off in VSC causes the backups to run fine but turning this off is not an accepted solution for us.
We are also using Backup Exec to backup some data on the disks of the VMs and apparently there is a known issue involving the Backup Exec VSS Provider conflicting with the VMware Snapshot Provider. On further investigation of all the VMs which are failing, there is no consistency in the installation pattern of both providers.
According to the NetApp article above, VSC simply "asks" VMware Tools to perform the snapshot. According to VMware, VMware Tools simply "asks" Microsoft VSS to perform the actual quiescing.
Can anyone offer any solutions or insight into this problem please?
Thanks in advance.
2017-08-16 08:53 AM
you are about to go down a rabbit hole of troubleshooting Vmtools, VSC and VSS. For this reason, we gave up on quiescing VSC backups several years ago. Have never had an issue restoring a VM. Even done some MS SQL machines without issue.
Quiescing was more trouble than it was worth, for us anyways.
2017-08-17 04:55 AM
Thanks for your response, this seems to be the general consensus for this solution. The puzzling thing is we had it working fine a month ago!
How do you provide crash-consistency for your VMs then?
2017-08-17 10:10 AM
The backups are crash consistent, just not quiesced. I have done many VM restores this way, and never once had an issue with data consistency. we are backing up databases externally with other tools. I have restored SQL VMs just for testing purposes using the VSC backups but wouldn;t want to rely on that for producton data.
We have since moved to CommVault, so haven't used VSC in about 6 months.
The issues we had with quiescing seemed to do mainly with time-outs waiting for the freeze/thaw cycles. We did work with support over the years, and things did improve. but it was still very common to see failures. 6.2.x was a lot better than previous releases, for us. We would get into states where the VSC jobs screen would be all jacked with partially failed running jobs that were "stuck" and never complete. then next run would not happen. lots of manually editing .xml files on the VSC server, and restarting services. 6.2.1 did solve most of that stuff. but after years of pain, we finally gave up on quiescing with VSC.