Subscribe
Accepted Solution

VSC - Backups Hanging (In progress)

Hi,

I have a situation where I have set VSC to backup at datastore level with a mixture of virtual machines including RDMS and VMDK drives.

All RDM mappings are set to physical so VSC wont look at backing these up, which is expected we use snapmanger for the data drives.

For some reason the virtual centre is "hung" on the following tasks:

Create virtual machine snapshot
TMLIVEDWHSRV-01
95%
Administrator
TMVCentreSrv01

01/04/2011 00:02:41
01/04/2011 00:02:41

NetApp Create Backup
VMFS5_OS
45%
Administrator
TMVCentreSrv01

01/04/2011 00:00:18
01/04/2011 00:00:18

Reconfigure virtual machine
TMLIVEDWHSRV-01
In Progress
Administrator
TMVCentreSrv01

01/04/2011 00:15:01
01/04/2011 00:15:01

* anyone know why there is a reconfigure event in this backup???


So i am in a situation where the datastore VMFS5_OS still has snapshots for every server that resides on there (20 VMS). There is 462.18GB free on that datastore totall size 900GB LUN/ 1.14TB VOL, so if this task doesnt clear up before the weekend we may experience a situation where by the datastore could be filled.

Bit of info about the backup job:

Enabled options: Inititate snapmirror update, Perform VMWare consistency snapshot

Bit of info about DWH:

Server 2008

18GB RAM

SnapDrive 6.3 x64

SnapManager 5.1 SQL

Bit of Info on storage system:

FAS2040 Active-Active configuration

3 x DS4243 shelves SAS 600

I can see a link between large RAM size and quiesced snapshots, but why would this cause this issue in this case when previous backups have always worked?

Anyone experienced these hanging jobs before or know a fix or manual overide to stop the job? Im concerned that with no abort option we could get in a situation where the job is waiting to complete, but datastores could fill?

Any help is appreciated!

Re: VSC - Backups Hanging (In progress)

I'm not sure why it would be reconfiguring a VM, very odd. What is in the SMVI.log file located in the install directory?

To do a manual overide you can stop then restart the SMVI service on the SMVI server and clean up the SMVI snapshots using the tool located here; http://blogs.netapp.com/virtualization/2010/02/cleaning-up-vmware-snapshots.html

However if the task is stuck in vCenter you may have to restart the vCenter service to "unstick" that one.

The SMVI log may indicate what is happening and why a reconfigure was launched.

Keith

Re: VSC - Backups Hanging (In progress)

Hi Keith,

Thanks for the reply,

I would not want to stop the snapshot operation as it still has its .00001 files etc for the host in question and i cannot afford to corrupt this server at any cost, also all of the other servers are currently locked i.e. cant manage the snaps on them as it looks like everything is waiting on the netapp job to complete.

Here are some key moments in the server.log file:

2011-04-01 00:00:10,835 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - A VMware consistency snapshot will be performed on Virtual Machine TMLIVEDWHSRV-01.

2011-04-01 00:00:18,097 [backup:aaa62c31788ca8929ec1be562048b95b:] WARN  - Virtual Machine TMLIVEDWHSRV-01 has disks attached via raw device mapping. These disks will not be backed up.

2011-04-01 00:00:18,097 [backup:aaa62c31788ca8929ec1be562048b95b:] WARN  - Virtual Machine TMLIVEDWHSRV-01 has disks attached via raw device mapping. These disks will not be backed up.

2011-04-01 00:00:18,103 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - Backing up the following virtual machine(s)

2011-04-01 00:00:18,105 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - The following virtual machines will have a VMware snapshot taken for consistency

2011-04-01 00:12:02,800 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:02,820 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:02,820 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:12,779 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:12,872 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:12,872 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:22,753 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:22,776 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:22,776 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:32,790 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:32,803 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:32,803 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:42,821 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:42,835 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:42,835 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:52,850 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:52,871 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:52,871 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:13:02,854 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING

All the way to present time...

There are no entires in the smvi.log for the backup time frame

Re: VSC - Backups Hanging (In progress)

Ok for future reference, if anyone gets VMware stuck at 95% on the creating snapshot here is the non destructive method of fixing it.

1. Reboot the VM the snap is stuck on from within the OS.

Its simple to justify to the system owner, if you cant reboot its going to go down anyway when the datastore fills up, taking all the other servers with it.

Thats it... Did this and the NetApp Job completed within 20 seconds. Very wierd seems to be a certain dislike on creating quiesced snapshots on VMs with large amounts of RAM. In our case the page files and ram sit on a seperate transient datastore that has no backup job on it.

Anyone else experiencing this error on VMs with large amounts of RAM?

Re: VSC - Backups Hanging (In progress)

How is your vmdk configured?

Re: VSC - Backups Hanging (In progress)

There seems to be an issue with the VMDK

Re: VSC - Backups Hanging (In progress)

Storage Spector,

The VMDK is stored in a shared datastore which is configured like so (from NetApp to VMware)

LUN mapped to ESXi host

ESXi host added and formatted to VMFS datastore

Datastore is then available on the ESXi cluster (We use VCentre to manage)

VMFS05 Datastore is using a 1MB block size (256GB max filesize)

Server looks like:

Hard disk 1: Virtual (VMFS05 -  70GB - THIN)

Hard disk 2: Virtual (VMFS05 -  250GB - THIN)

Hard disk 3: Virtual (VMFS05 -  250GB - THIN)

Hard disk 4: Mapped Raw Lun (Pointer file on VMFS05)

Hard disk 5: Mapped Raw Lun (Pointer file on VMFS05)

Hard disk 6: Mapped Raw Lun (Pointer file on VMFS05)

VMDK is Thin provisioned, plenty of room (60%) in the datastore

Re: VSC - Backups Hanging (In progress)

Can I ask what is running in the VM? The VMware VSS writer would appear to be having trouble quiecing the VM prior to creating the snapshot. This is not that uncommon but usually VMware times this out and fails the snapshot which we then record as a warning with VSC/SMVI. Odd.

The question is though, is that VMware snapshot really getting you anything? More and more I discourage the use of them with VSC as the quiecing it does does not give you any greater data integrety (or very little) and can cause odd problems like this. Without the VMware snaps the VSC backups are nearly instant with no load on the ESX servers and no performance impact on the storage. You can then take the backups more often.

I usually build for customers a hourly backup job with a retention of 2. This will snap the VMs hourly but only tie those blocks up for 2 hours which costs very little disk space. They not have 2 very short recovery points.

Back to your problem, if your VM is very busy or the app is behaving badly you could try to upgrade or refresh the VMware tools but if the problem continues you might want to either uninstall the VMware VSS writer or turn off VMware snapshots from the VSC console.

Any chance you have SnapDrive loaded in that VM for the RDMs?

Keith

Re: VSC - Backups Hanging (In progress)

The server is running:

  • Windows 2008 R1 64Bit
  • SQL Server 2008
  • SQL Analysis Services
  • Log Rhythm
  • DB Protect
  • SnapDrive 6.3.0.4601
  • SnapManager for SQL Server 5.1

SnapDrive will be dealing with the backups of the specific user data in future, at present it has been configured correctly via its configuration wizard but does not have any scheduled jobs.

My understanding of a quiesced snapshot was to ensure that you could boot the VM with no data corruption, if this is not the case then I will simply remove the option to use quiesced snapshots.

Cheers

Re: VSC - Backups Hanging (In progress)

There is a know issue with the VSS writer of SnapDrive colliding with the VSS writer of VMware tools. You will want to remove the option for quieced snapshots.

Keith