Subscribe

SMVI: Cannot quiesce vm after sfr restore unless it is restarted

Since a few days, i expierience problems during snapshots of virtual machines. The error is reproducible on w2k8 r2 machines. Other os i didn't test at the moment.

Environment: VSC 4 VSphere ESXi 5 U1 Cluster NetApp FAS 3140 Ontap 8.0.1 P5 Shared Storage (NFS)

A quiesced snapshot of a vm fails with the following error details:

The guest OS has reported an error during quiescing. The error code was: 5 The error message was: Asynchronous operation failed: VssSyncStart

The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Snapshot operation aborted

The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Error when notifying the sync provider.

The snapshot doesn't get deleted after that so i have to consolidate these snapshot vmdk's by hand.

Scenario:

  1. Did a scheduled backup with quiescing option enabled (runs well for almost a year with previous vsc versions)  SFR never made problems. Recently i did an update to VSC 4.0 cause i also update my VSphere Environment from 4.1 to 5.0 Update 1.
  2. Mounted the backup using SFR and can access it without a problem.
  3. Dismounted backup and deleted SFR session.
  4. Trying to do a snapshot again with quiescing optiion enabled which fails with the above error.
  5. Restart the vm and do a snapshot again which works without a problem (on another w2k8 server restarting didn’t fix it!)

The only additional information i see is an event in windows system log: The disk signature of disk 1 is equal to the disk signature of disk 0.

Another thing is, that the *vmx was modified and the disk mode changed: scsi0:1.mode = "independent-persistent" ???????????

What's going on here? I never saw this issues before!

Any help is welcome!

BR

Thomas

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

it looks like the snapshot is still mounted and the vmdk of the snapshot is still attached to the VM (on scsi0:1).

double check it and manually remove the disk and mounted snapshot, if required.

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

that's right. The snapshot still exists and is being used. Even you consolidate the snapshot, the entry about the previously mounted backup still exists.

To workaround i did the following:

  1. Consolidate the snapshot (VCenter --> Snapshots >> Consolidate)
  2. Shutdown the vm
  3. Edit the vmx and delete the obsolete entries
  4. Startup the vm again and everything is fine again ( ... until you mount a backup again)

Meanwhile i opened up a case.

I will post the result ;-)

BR

Thomas

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

so the main problem is, that the unmounting does not work in the beginning.

have a look at the VMware logs when you unmount the snapshot in SFR, maybe this gives some hints.

I think that it cannot remove the VMDK from the running VM in the first place. and this will lead to a chain reaction:

-vmdk cannot be removed from VM

-Netapp snapshot cannot be umounted

-following SMVI job failes because of VSS error

-VMware snapshots are left over

maybe in your environment/constellation you need to shutdown your VM before dismounting the SFR snapshot and everything else may work as expected...

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

Shutting down the vm to avoid this can not be the solution by design ;-)

I never had this issue with vsc 2.0 or 2.1 so this is probably a specific problem/behaviour with 4.0.

Our environment is set up following best practise guides form netapp/vmware so nothing exotic.

Anyway, i'll wait for the support engineers answers.

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

sure, shutting down cannot be the final solution.

but it would be a workaround and may help in troubleshooting the issue.

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

As Dominic said it seems like the SFR isn't cleaning up correctly which is confusing the VSS writer in the VMware tools when it tries to do a quiece. I suspect that after a SFR session if you did a disk scan you would be fine. Windows would see that disk 1 no longer exists and everything would work. It should be doing that scan after we remove the disk though....odd.

Keith

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

Hello Guys,

You are right. The SFR isn't cleaning up correctly and after unmounting the backup the remaining entries confuse the vss writer even there are no more mounted disks (sure i did rescan the disks in windows).

Last night, the NetApp SE sent me a link in which they suggest to reinstall the vmware tools without the VSS component which i did today. Now SFR and SMVI are working again as expected but the VMX still contains the entries of the SFR mounted disk!

For me, this issue is w2k8 specific, cause under w2k3 it has no effect.

Thomas

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

Hi Thomas,

Any luck?  We're experiencing the same issues.  Very similar situation;

Ran scheduled backups with quiescing option enabled on a 2008R2 SQL server, ran well for at least 6 months.  We recently updated the environment from 4.1 to 5.0 Update 1 and now server backup fails with "The guest OS has reported an error during quiescing. The error code was: 3 The error message was: Error when notifying the sync provider. "  We temporarily "fix" the issue by creating a snapshot, then going into the Snapshot manager and performing a "Delete All" clearing all the extra disks... until the next backup and then we start over again with the failure.

We're opening a case in the next day or two with VMware & NetApp at the same time. Will post with any progress. 

Thanks for starting this discussion

Regards,

Paolo

Re: SMVI: Cannot quiesce vm after sfr restore unless it is restarted

Hi Paolo,

the last information from NetApp was that they suggest me to open a case at VMware.

I already deny it and asked them to do further research!

I still think it is a VSC related issue and also wait for an update from NetAPP.

As soon as I have news I’m going to post it!

BR


Thomas