Subscribe

SMVI failures

[ Edited ]

I am running SMVI 2.0 with roughly 25 jobs spanned out through the day/night.  At random it seems one or two of these jobs fail.

SMVI log states the following: 2010-03-23 08:32:56,849         ERROR - VM "EVSQL" will not be backed up since VMware snapshot create operation failed.

VCenter log states: Create virtual machine snapshot EVSQL Cannot create a quiesced snapshot because the snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.  VMware tools is up-to-date.

hostd.log

[2010-03-23 08:30:45.782 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] Tools version status: ok
[2010-03-23 08:30:45.782 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] VMware Tools are current in guest: true
[2010-03-23 08:30:48.279 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] Quiesced snapshot backup agent event: (vim.vm.BackupEventInfo) {
   dynamicType = <unset>,
   eventType = "keepAlive",
   code = 0,
   message = "",
}

It repeats the above message until two minutes later when it aborts (see message below)

001b78592710/EVSQL/EVSQL.vmx'] Notifying completion of quiesced snapshot via backup agent.
[2010-03-23 08:32:39.408 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] Tools version status: ok
[2010-03-23 08:32:39.408 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] VMware Tools are current in guest: true
[2010-03-23 08:32:39.408 F65FC6D0 verbose 'vm:/vmfs/volumes/4aafb0e2-37936ca0-d01a-001b78592710/EVSQL/EVSQL.vmx'] Quiesced snapshot backup agent event: (vim.vm.BackupEventInfo) {
   dynamicType = <unset>,
   eventType = "providerAbort",
   code = 3,
   message = "Snapshot operation aborted",
}

If I run the SMVI job manually it completes successfully.  I can also create VMware snapshots at will.  This is just one example but I have 1-2 SMVI backups that fail a day with the same error. As a side note, does SMVI create a snapshot of VM memory and does it quiesce the guest file system?  Two of these are options for vmware snapshots and SMVI does not mention the options it uses.

Thanks for any insight.

Re: SMVI failures

Hi,

Failing VM snapshots seem to be a frequently recurring topic

Looking at the name of your host - is it by any chance a SQL server with some LUNs connected via iSCSI?

If that's the case, have a look at this:

http://communities.netapp.com/message/23397#23397

Regards,
Radek

Re: SMVI failures

Hello,

Thanks for response.  It is a SQL server on VM but it is fiber channel not iSCSi.

Thanks,

Re: SMVI failures

Just a couple things to check: when your scheduled jobs run, were they overlap each other? When a scheduled job fail, was the system busy? Are the failed jobs any different from the successful ones?

Thanks,

Wei

Re: SMVI failures

I have scheduled the SMVI jobs so that they do not overlap.  Although I don't think that should be an issue as the backup process should be under a minute. There really doesn't seem to be any similar traits of the jobs that are failing.  Some of these servers that are backed up are front-end and bear no processor or memory strain so they shouldn't be "busy."  All the backup options are the same other than the schedule time.  We aren't using scripts or anything special.

I noticed that the server which failed starts and stops a service every minute.  So perhaps this is why a VMware snapshot could never be successfully taken?  The only problem with the cause being on the client is that the backup succeeds all other times.  For instance, tomorrow this backup will not fail. So it occurs sporadically as I stated before.

Re: SMVI failures

Hi

According to VMware KB article #1009073, VMware Tools are unable to create quiesced

snapshots of virtual machines that have NPIV RDM LUNs or Microsoft iSCSI Software Initiator

LUNs mapped to them (this often results in timeout errors during snapshot creation). Is this the issue that you're seeing?

REgards

Amrita

Re: SMVI failures

As stated above we are using FC and I have never configured NPIV for the few RDMs we have so I don't think that's it either.  Thanks for the shot though.  Today none of the SMVI jobs failed. 

Re: SMVI failures

Originally the issue was "A general system error occured: Protocol error from VMX" after reinstalling VMware Tools the error was "Cannot create a quiesced snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine."

This required VMware support as attempting to create a snapshot to quiesce the OS was failing from Virtual Center. Solution was reinstall VMware tools and this KB

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1019848

To manually register the VMware Snapshot Provider:

  1. Click Start > Run, type cmd, and click OK. 
  2. Enter the following commands in      sequence:

regsvr32 "C:\Program Files\VMware\VMware Tools\Drivers\vss\VCBRequestor.dll"

regsvr32 "C:\Program Files\VMware\VMware Tools\Drivers\vss\VCBSnapshotProvider.dll"

"C:\Program Files\VMware\VMware Tools\COMREG.EXE" -register "C:\Program Files\VMware\VMware Tools\Drivers\vss\VCBSnapshotProvider.dll" "VMware Snapshot Provider" "vmvss" "VMware Snapshot Provider"

If, when registering the COM application, you see the error error 80110801 when attempting to register the COM application, you must delete the VMware Snapshot Provider COM application.

Hope this helps. SMVI ran the job successfully after that.