VMware Solutions Discussions

VSC - Backups Hanging (In progress)

penningtonkr

Hi,

I have a situation where I have set VSC to backup at datastore level with a mixture of virtual machines including RDMS and VMDK drives.

All RDM mappings are set to physical so VSC wont look at backing these up, which is expected we use snapmanger for the data drives.

For some reason the virtual centre is "hung" on the following tasks:

Create virtual machine snapshot
TMLIVEDWHSRV-01
95%
Administrator
TMVCentreSrv01

01/04/2011 00:02:41
01/04/2011 00:02:41

NetApp Create Backup
VMFS5_OS
45%
Administrator
TMVCentreSrv01

01/04/2011 00:00:18
01/04/2011 00:00:18

Reconfigure virtual machine
TMLIVEDWHSRV-01
In Progress
Administrator
TMVCentreSrv01

01/04/2011 00:15:01
01/04/2011 00:15:01

* anyone know why there is a reconfigure event in this backup???


So i am in a situation where the datastore VMFS5_OS still has snapshots for every server that resides on there (20 VMS). There is 462.18GB free on that datastore totall size 900GB LUN/ 1.14TB VOL, so if this task doesnt clear up before the weekend we may experience a situation where by the datastore could be filled.

Bit of info about the backup job:

Enabled options: Inititate snapmirror update, Perform VMWare consistency snapshot

Bit of info about DWH:

Server 2008

18GB RAM

SnapDrive 6.3 x64

SnapManager 5.1 SQL

Bit of Info on storage system:

FAS2040 Active-Active configuration

3 x DS4243 shelves SAS 600

I can see a link between large RAM size and quiesced snapshots, but why would this cause this issue in this case when previous backups have always worked?

Anyone experienced these hanging jobs before or know a fix or manual overide to stop the job? Im concerned that with no abort option we could get in a situation where the job is waiting to complete, but datastores could fill?

Any help is appreciated!

22 REPLIES 22

Re: VSC - Backups Hanging (In progress)

keitha

I'm not sure why it would be reconfiguring a VM, very odd. What is in the SMVI.log file located in the install directory?

To do a manual overide you can stop then restart the SMVI service on the SMVI server and clean up the SMVI snapshots using the tool located here; http://blogs.netapp.com/virtualization/2010/02/cleaning-up-vmware-snapshots.html

However if the task is stuck in vCenter you may have to restart the vCenter service to "unstick" that one.

The SMVI log may indicate what is happening and why a reconfigure was launched.

Keith

Re: VSC - Backups Hanging (In progress)

penningtonkr

Hi Keith,

Thanks for the reply,

I would not want to stop the snapshot operation as it still has its .00001 files etc for the host in question and i cannot afford to corrupt this server at any cost, also all of the other servers are currently locked i.e. cant manage the snaps on them as it looks like everything is waiting on the netapp job to complete.

Here are some key moments in the server.log file:

2011-04-01 00:00:10,835 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - A VMware consistency snapshot will be performed on Virtual Machine TMLIVEDWHSRV-01.

2011-04-01 00:00:18,097 [backup:aaa62c31788ca8929ec1be562048b95b:] WARN  - Virtual Machine TMLIVEDWHSRV-01 has disks attached via raw device mapping. These disks will not be backed up.

2011-04-01 00:00:18,097 [backup:aaa62c31788ca8929ec1be562048b95b:] WARN  - Virtual Machine TMLIVEDWHSRV-01 has disks attached via raw device mapping. These disks will not be backed up.

2011-04-01 00:00:18,103 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - Backing up the following virtual machine(s)

2011-04-01 00:00:18,105 [backup:aaa62c31788ca8929ec1be562048b95b:] INFO  - The following virtual machines will have a VMware snapshot taken for consistency

2011-04-01 00:12:02,800 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:02,820 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:02,820 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:12,779 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:12,872 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:12,872 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:22,753 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:22,776 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:22,776 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:32,790 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:32,803 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:32,803 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:42,821 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:42,835 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:42,835 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:12:52,850 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING
2011-04-01 00:12:52,871 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Waiting on VMware snapshot for VM TMLIVEDWHSRV-01
2011-04-01 00:12:52,871 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - FLOW-11012: Operation requested retry
2011-04-01 00:13:02,854 [backup:aaa62c31788ca8929ec1be562048b95b:] DEBUG - Task state was: RUNNING

All the way to present time...

There are no entires in the smvi.log for the backup time frame

Re: VSC - Backups Hanging (In progress)

penningtonkr

Ok for future reference, if anyone gets VMware stuck at 95% on the creating snapshot here is the non destructive method of fixing it.

1. Reboot the VM the snap is stuck on from within the OS.

Its simple to justify to the system owner, if you cant reboot its going to go down anyway when the datastore fills up, taking all the other servers with it.

Thats it... Did this and the NetApp Job completed within 20 seconds. Very wierd seems to be a certain dislike on creating quiesced snapshots on VMs with large amounts of RAM. In our case the page files and ram sit on a seperate transient datastore that has no backup job on it.

Anyone else experiencing this error on VMs with large amounts of RAM?

Re: VSC - Backups Hanging (In progress)

STORAGESPECTOR

How is your vmdk configured?

Re: VSC - Backups Hanging (In progress)

STORAGESPECTOR

There seems to be an issue with the VMDK

Re: VSC - Backups Hanging (In progress)

penningtonkr

Storage Spector,

The VMDK is stored in a shared datastore which is configured like so (from NetApp to VMware)

LUN mapped to ESXi host

ESXi host added and formatted to VMFS datastore

Datastore is then available on the ESXi cluster (We use VCentre to manage)

VMFS05 Datastore is using a 1MB block size (256GB max filesize)

Server looks like:

Hard disk 1: Virtual (VMFS05 -  70GB - THIN)

Hard disk 2: Virtual (VMFS05 -  250GB - THIN)

Hard disk 3: Virtual (VMFS05 -  250GB - THIN)

Hard disk 4: Mapped Raw Lun (Pointer file on VMFS05)

Hard disk 5: Mapped Raw Lun (Pointer file on VMFS05)

Hard disk 6: Mapped Raw Lun (Pointer file on VMFS05)

VMDK is Thin provisioned, plenty of room (60%) in the datastore

Re: VSC - Backups Hanging (In progress)

keitha

Can I ask what is running in the VM? The VMware VSS writer would appear to be having trouble quiecing the VM prior to creating the snapshot. This is not that uncommon but usually VMware times this out and fails the snapshot which we then record as a warning with VSC/SMVI. Odd.

The question is though, is that VMware snapshot really getting you anything? More and more I discourage the use of them with VSC as the quiecing it does does not give you any greater data integrety (or very little) and can cause odd problems like this. Without the VMware snaps the VSC backups are nearly instant with no load on the ESX servers and no performance impact on the storage. You can then take the backups more often.

I usually build for customers a hourly backup job with a retention of 2. This will snap the VMs hourly but only tie those blocks up for 2 hours which costs very little disk space. They not have 2 very short recovery points.

Back to your problem, if your VM is very busy or the app is behaving badly you could try to upgrade or refresh the VMware tools but if the problem continues you might want to either uninstall the VMware VSS writer or turn off VMware snapshots from the VSC console.

Any chance you have SnapDrive loaded in that VM for the RDMs?

Keith

Re: VSC - Backups Hanging (In progress)

penningtonkr

The server is running:

  • Windows 2008 R1 64Bit
  • SQL Server 2008
  • SQL Analysis Services
  • Log Rhythm
  • DB Protect
  • SnapDrive 6.3.0.4601
  • SnapManager for SQL Server 5.1

SnapDrive will be dealing with the backups of the specific user data in future, at present it has been configured correctly via its configuration wizard but does not have any scheduled jobs.

My understanding of a quiesced snapshot was to ensure that you could boot the VM with no data corruption, if this is not the case then I will simply remove the option to use quiesced snapshots.

Cheers

Re: VSC - Backups Hanging (In progress)

keitha

There is a know issue with the VSS writer of SnapDrive colliding with the VSS writer of VMware tools. You will want to remove the option for quieced snapshots.

Keith

View solution in original post

Re: VSC - Backups Hanging (In progress)

penningtonkr

Consider it done

Re: VSC - Backups Hanging (In progress)

support_2

I didn't get a chance to read all of you problem, looking for some help on another issue.

I came up with a work around for this problem a while back ago better than taking a crash consistent snapshots (I mean why bother to pay for smvi if you are ok with that, just schedule them through systems manager?)

vmware allows you to run scripts before and after a vmware snapshot, so the idea is just run a batch file that disable the data ontap vss provider --> take the vmware snapshot --> take netapp snap --> enable the vss provider when vmware gets rid of it's snap

look up pre-freeze and post-thaw scripts on vmwares site.

* have to plan out the backups a little bit as the data ontap vss provider needs to be enabled for snap manager for sql, this problem when away later version of vmware / snapdrive

Here are the batch files (need to modified to your enviroment)

::*************************************************************

:: This script disables the Data ONTAP VSS Service so

:: you can take a vmware snapshot, this should be ran

:: as pre-freeze script

::*************************************************************

echo off

:: Disable the following services, SnapDrive, SnapDrive Management Service, Data ONTAP VSS Hardware Provider

NET Stop SWSvc

NET Stop SDMgmtSvc

NET Stop navssprv

:: Unregister the Data ONTAP VSS Service

"D:\Program Files\NetApp\SnapDrive\navssprv.exe" -r service /u

******************next script*************************************

::*************************************************************
:: This script enables the Data ONTAP VSS Service so
:: you can take a snap manager snapshot, you should run this
:: script as the thaw
::*************************************************************

echo off

:: Enable the following services, SnapDrive, SnapDrive Management Service, Data ONTAP VSS Hardware Provider
NET Start SDMgmtSvc
NET Start SWSvc

:: Register & Start the Data ONTAP VSS Service
"D:\Program Files\NetApp\SnapDrive\navssprv.exe" -r service -a <service account> -p <service account's password>
NET Start navssprv

Hope this helps.

Re: VSC - Backups Hanging (In progress)

robertmidwest

We have the same problem and have been going back and forth with support with NetApp and VMware.  Very displeased with NetApp support on this.  Their Reference Architecture document:  tr-3785 - Microsoft Exchange Server, SQL Server, and SharePoint Server Mixed Workload on VMware vSphere 4, NetApp Unified Storage (FC, iSCSI, and NFS), and Cisco Nexus Unified Fabric says they can use VSC with VMware Snapshots  on vSphere 4.0.0, MS SQL 2008 running on Windows 2008 x64 Enterprise Edition SP2 (Page 9).  They say in the solution that they're using VSS but only on the NFS solution (P32).

For SMVI best practices, see NetApp TR-3737. SMVI leverages the VSS requestor in VMware Tools to create application-consistent backups. This is invoked as part of the VMware snapshot performed before creating the Snapshot copy on the NetApp array.

How reliable are those scripts for stopping services and unregistering the VSS provider and then reversing the issue?

I might have to try this.

I've had a couple of tech's say it's a VMware bug but they haven't been able to provide any evidence of this.  But, here's the thing, I can do a VMware snapshot fine in the VM.  But it appears to fail on the same piece using VSC if I specifiy VMware consistent snapshots.  It works fine if I don't select that. 

A huge pain to work with since when it fails I have to do a several step 'cleanup' that requires the reboot of vCenter which ticks off the rest of the techs.

And don't get me started on response times, feedback and escalations......

Re: VSC - Backups Hanging (In progress)

robertmidwest

Is anyone having this problem on vSphere 4.1 or is it only vSphere 4?

Re: VSC - Backups Hanging (In progress)

support_2

The scripts work fine for me but of course test them out as they need to be modified.  The idea was pretty simple stop the services, unregister, then the following script register and start services.  You can run it manually on a server.

When you do vm snapshot from vcenter, check the box quiesce the file systems and uncheck the box for the memory --> this is type of snapshot smvi ask vcenter to take and is different from the defaults in vmware.  If you are having the problem with the vss providers then your snapshot should time out at 95% with these settings.

The vss issue is known bug don't imagen you will get much help, and is fixed in later versions of snapdrive or vmware (not sure which as I upgraded vmware and then snapdrive and noticed that the problem was gone but didn't bother to figure out which one fixed what).  The problem also exist in some other backup software that lay down their own vss providers (like symantecs backup exec).  Idk this issue really falls in no-mans land type scenario.  In the end, without snapdrive installed then vmware snapshots will work fine.  This problem occurs becuase of the way windows handles vss providers, how vmware looks for it's vss provider, how netapp installs theirs.  So there is a lot of figure pointing.

vmware also has a driver for prior to when window's used vss providers (this driver only exist for 2k3 and below, again it was really created for windows 2K) forget the exact details of this becuase it was over year ago when I was working on this problem, but you can also use this on your 2k3 systems.

I would not suggest moving to 4.1 or beyond snapdrive 6.2.x, I had to downgrade my enviroment, there are some nasty bugs in which your problem will be fixed but snapdrive is not able to see fc luns.  They will still be connected to the vm\esx, windows will be fine, snapdrive will say something along the lines of not being able to enumerate them.  Issues with later snapdrive versions dealing with vm's that have more than 2 scsi controllers.  Again that is like a 4-6 month old problem think they are only for fc but can't be sure, it is a on the forums I will get that link, and it sucks.

we are running esx 4.0, vcenter 4.1, snapdrive 6.2.1, snapmanager for sql 5.0

Oh and welcome to dealing with netapp support , ask to speak to a duty manager, they can usually provide some assistents in getting you some "better" help,  aslo tell them that you want the case to be a P2 but in my experiance they will bust this down to a 3 the first chance they get, feedback is never going to be there at least with what I have delt with, you have to call them up everyday force them to help you.  If it makes you feel any better the more times you have to call in to support the better you will become with your enviroment, you will be force to learn .  Netapp is also a better product it's support is just really lacking and if you pay for their upgraded support it prices them out of the competetion (upgraded support will also take care of all your pain points or so I have heard).  Usually find quicker help on these forums which is why I a making sure I start to contribute, makes you feel any better your not alone in your frustrations with support (we have seriously brought up the topic of going emc route with our next sans becuase of support- which isn't much but it might make you feel better having that talk to your netapp sales team, maybe if enought customers do it they will get the message, maybe not).

Re: VSC - Backups Hanging (In progress)

support_2

same issue in another thread more detail on scripts, posted as a different username Kris (udf) its at the very bottom anyways though

http://communities.netapp.com/thread/3015?tstart=3


4.1 problem

http://communities.netapp.com/message/45241

Earn Rewards for Your Review!
GPI Review Banner
All Community Forums
Public