2012-08-24 01:28 AM
Since a few days, i expierience problems during snapshots of virtual machines. The error is reproducible on w2k8 r2 machines. Other os i didn't test at the moment.
Environment: VSC 4 VSphere ESXi 5 U1 Cluster NetApp FAS 3140 Ontap 8.0.1 P5 Shared Storage (NFS)
A quiesced snapshot of a vm fails with the following error details:
The guest OS has reported an error during quiescing. The error code was: 5 The error message was: Asynchronous operation failed: VssSyncStart
The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Snapshot operation aborted
The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Error when notifying the sync provider.
The snapshot doesn't get deleted after that so i have to consolidate these snapshot vmdk's by hand.
The only additional information i see is an event in windows system log: The disk signature of disk 1 is equal to the disk signature of disk 0.
Another thing is, that the *vmx was modified and the disk mode changed: scsi0:1.mode = "independent-persistent" ???????????
What's going on here? I never saw this issues before!
Any help is welcome!
2012-09-05 02:55 AM
it looks like the snapshot is still mounted and the vmdk of the snapshot is still attached to the VM (on scsi0:1).
double check it and manually remove the disk and mounted snapshot, if required.
2012-09-05 04:13 AM
that's right. The snapshot still exists and is being used. Even you consolidate the snapshot, the entry about the previously mounted backup still exists.
To workaround i did the following:
Meanwhile i opened up a case.
I will post the result ;-)
2012-09-05 04:29 AM
so the main problem is, that the unmounting does not work in the beginning.
have a look at the VMware logs when you unmount the snapshot in SFR, maybe this gives some hints.
I think that it cannot remove the VMDK from the running VM in the first place. and this will lead to a chain reaction:
-vmdk cannot be removed from VM
-Netapp snapshot cannot be umounted
-following SMVI job failes because of VSS error
-VMware snapshots are left over
maybe in your environment/constellation you need to shutdown your VM before dismounting the SFR snapshot and everything else may work as expected...
2012-09-05 06:20 AM
Shutting down the vm to avoid this can not be the solution by design ;-)
I never had this issue with vsc 2.0 or 2.1 so this is probably a specific problem/behaviour with 4.0.
Our environment is set up following best practise guides form netapp/vmware so nothing exotic.
Anyway, i'll wait for the support engineers answers.
2012-09-05 08:31 AM
As Dominic said it seems like the SFR isn't cleaning up correctly which is confusing the VSS writer in the VMware tools when it tries to do a quiece. I suspect that after a SFR session if you did a disk scan you would be fine. Windows would see that disk 1 no longer exists and everything would work. It should be doing that scan after we remove the disk though....odd.
2012-09-06 01:30 AM
You are right. The SFR isn't cleaning up correctly and after unmounting the backup the remaining entries confuse the vss writer even there are no more mounted disks (sure i did rescan the disks in windows).
Last night, the NetApp SE sent me a link in which they suggest to reinstall the vmware tools without the VSS component which i did today. Now SFR and SMVI are working again as expected but the VMX still contains the entries of the SFR mounted disk!
For me, this issue is w2k8 specific, cause under w2k3 it has no effect.
2012-09-18 01:13 PM
Any luck? We're experiencing the same issues. Very similar situation;
Ran scheduled backups with quiescing option enabled on a 2008R2 SQL server, ran well for at least 6 months. We recently updated the environment from 4.1 to 5.0 Update 1 and now server backup fails with "The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Error when notifying the sync provider. " We temporarily "fix" the issue by creating a snapshot, then going into the Snapshot manager and performing a "Delete All" clearing all the extra disks... until the next backup and then we start over again with the failure.
We're opening a case in the next day or two with VMware & NetApp at the same time. Will post with any progress.
Thanks for starting this discussion
2012-09-19 12:16 AM
the last information from NetApp was that they suggest me to open a case at VMware.
I already deny it and asked them to do further research!
I still think it is a VSC related issue and also wait for an update from NetAPP.
As soon as I have news I’m going to post it!