VMware Solutions Discussions

Problem with vmware quiesced snapshot (and smvi) and iscsi rdm

f_duranti
11,235 Views

I've a problem with quiesced file system snapshot on a VM that has RDM iscsi Lun connected through esx host instead of using iscsi.

The configuration is as follow:
esx 4 update 1 (with all patches excluded the ones released on last friday) using NFS for VM operating system disks

netapp 3160 with data ontap 7.3.2p4

1 VM with:

windows 2008 64bit (fully patched)

exchange 2007

snapdrive 6.2p1

snapmanager for exchange 6.0

I have 4 lun created with snapdrive and connected through esx instead of using microsoft iscsi on 1 volume.

I also have a lun with the descriptors for the RDM on another volume.

When I try to get a snapshot of the VM with the option to quiesce the file systems it get to 95% and then It will exit after 15 minutes with errors.

Doing an exchange backup with snapmanager works without any problem.

I tried disabling the VSS provider inside the vmware tools (they're also updated to the latest ones) and in that case the snapshot will work istantly but I think that without the VSS provider the snapshot will not be consistent.

Anyone is using a similar configuration with success leaving the vss provider on?

10 REPLIES 10

amritad
11,215 Views

SMVI supports backup and recovery of virtual disks in VMFS and NFS datastores.
• The application SnapManager products support backup and recovery of applications whose data is
stored on RDM LUNs or Microsoft iSCSI Software Initiator LUNs mapped to the virtual machine.
• By default, SMVI uses quiesced VMware snapshots of virtual machines to capture the consistent
state of the virtual machines prior to making a Data ONTAP Snapshot copy of the backing storage.
According to VMware KB article #1009073, VMware Tools are unable to create quiesced
snapshots of virtual machines that have NPIV RDM LUNs or Microsoft iSCSI Software Initiator
LUNs mapped to them (this often results in timeout errors during snapshot creation). Therefore,
customers using the Microsoft iSCSI Software Initiator in the guest and running SMVI with VMware
snapshots turned on, which is not recommended, are at high risk of experiencing SMVI backup
failures due to snapshot timeouts caused by the presence of Microsoft iSCSI Software Initiator
LUNs mapped to the virtual machines.
VMware’s general recommendation is to disable both VSS components and the sync driver in VMware Tools
(which translates to turning off VMware snapshots for any SMVI backup jobs that include virtual machines
mapped with Microsoft iSCSI Software Initiator LUNs) in environments that include both Microsoft iSCSI
Software Initiator LUNs in the VM and SMVI, thereby reducing the consistency level of a virtual machine
backup to point-in-time consistency. However, by using SDW/SM to back up the application data on the
Microsoft iSCSI Software Initiator LUNs mapped to the virtual machine, the reduction in the data consistency
level of the SMVI backup has no effect on the application data.
Another recommendation for these environments is to use physical mode RDM LUNs, instead of Microsoft
iSCSI Software Initiator LUNs, when provisioning storage in order to get the maximum protection level from
the combined SMVI and SDW/SM solution: guest file system consistency for OS images using VSS-assisted
30 SnapManager 2.0 for Virtual Infrastructure Best Practices
SMVI backups, and application-consistent backups and fine-grained recovery for application data using the
SnapManager applications.

Does this make sense? And what exactly fails in your environment?

REgards

Amrita

f_duranti
11,215 Views

My problem is to being able to do the quiesced snapshot of the operating system to be sure that it will be consistent.

I've read that KB but it only talks about microsoft iscsi and not of RDM LUN.

The strange things is that on my other 2 exchange server (that use microsoft iscsi) I'm able to do a quiesced fs snapshot without any problem.

On the machine when I get the problem it seems to be half a vmware/half a snapdrive problems.

When doing the snapshot on the machine with microsoft iscsi initiator i see that:

1) data onta vss add the source

2) snapshot lun are created on the storage bu snapdrive

3) lun are mapped by snapdrive

4) targetlun and snapshot are deleted by snapdrive

5) vm snapshot complete successfully

On the machine with RDM and without microsoft software iscsi i see:

1) data onta pvss add the source

2)  snapshot lun are created on the storage bu snapdrive

3) lun are mapped by snapdrive

here there are 15 minutes of wait

4) i start to get errr by vss provider:

-VSS 12362 - A Shadow Copy LUN was not detected in the system and did not arrive.
5)snapdrive report targetlun delete succeded

6) snapdrive report an error in the deletesnapshot:

-snapdrive eventid 245:

DeleteSnapShot operation failed.

Storage System Name = storage1

lunPath = /vol/exchlab_2007/exchlab2007
Snapshot copy Name = {69f15de6-e63e-4b25-907f-35202bf40fa4}
Error code = 0xc004030b
Error description = An attempt to delete Snapshot copy '{69f15de6-e63e-4b25-907f-35202bf40fa4}' of the 'exchlab_2007' volume failed on the storage system 'storage1'. Error code: 16. Error description: LUN clone.

So in my understanding it seems that snapdrive in this case in not able to complete the point 3 to map the lun or is not able to see the snapshot of the VM.

It would be great if snapdrive will not even try to snapshot those disks when the VSS provider is invoked for a vm quiesced snapshot but it try and fail in the case of RDM lun (instead it works out good in the case of microsoft iscsi initiator lun).

We are trying to move to external RDM for DR using SD6.2/SM6.0 to backup/mirror exchange data, SMVI 2.0 to backup/mirror the machine and SRM 4 (with the 1.4.3 netapp adapter when it will be out) to restart the environment in DR site but with those error our exchange server will not have the OS disk quiesced/consistent.

Regards

Francesco

amritad
11,215 Views

Hi

You should file a support case to check the exact problem that SnapDrive is having. There are certain KB articles available on NOW for this as well.

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb27084

Regards

Amrita

f_duranti
11,215 Views

I will probably do it.

I was asking here to know if someone has the same configuration without problems or if it was a known malfunction. Before posting I've searched here, on now and on vmware KB/forums to see if there was some solution but I only found a suggestion here to disable vmware vss snapshot provider but this not seems a "good solution"...

The problem seems to be in part snapdrive related but it's also strange because snapmanager/snapdrive backup on that server work without errors so It seems related to snapdrive/vmware tools vss provider interaction and I hope that I will not be forwarded from netapp to vmware and from vmware to netapp to solve a "in between" problem...

Regards

Francesco

douglas_helm
11,215 Views

Did you find a resolution to this problem?  I am experiencing the exact same thing and I am in line with TR-3737 section 9.2.3.  "OS Is on VMDK, Application Data Is on Physical Mode RDMs"  We are not using NPIV and we aren't using iSCSI but it is installed on the server.

We did use the Microsoft iSCSI initiator during data migration to the Netapp, but there are no LUNs mapped thru iSCSI.

My next step will be to stop the iSCSI service and see if is hanging it up.  If it works with the iSCSI service stopped, then I will uninstall MS iSCSI.

amritad
11,215 Views

Hi

VMware's general recommendation is to disable VSS components and sync driver in VMware tools in such configurations, reducing consistency level of virtual machine snapshot to point-in-time consistency. Given that there is no consistency advantage to creating non-quiesced VMware snapshots over just creating Data ONTAP snapshots of backing storage for the virtual machine, the suggested recommendation for SMVI users is to disable Vmware snapshots for the backup jobs that back up virtual machines using Microsoft iSCSI Initiator LUNs, relying on Data ONTAP snapshots to provide point-in-time consistency. SMVI backups do not extend to application data stored on RDMs or iSCSI LUNs, so there is no reduction in data consistency level for the application data stored on external LUNs and protected by SnapManager.

And here is the VMware KB article that talks about this VMware limitation

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009073

Regards

Amrita

miguel_maldonado
11,215 Views

Hello,

We were having a similar issue with SMVI and RDM. We have a Microsoft Exhcnge 2010 cluster. The Mailboxes VM, have the disk that storage the Exchgne databases mapped through VMware RDM rather than from Microsoft iscsi initiator. Since we have this configurations, what we did to allow SMVI snaphost was to exclude the RDM disk and have SnapManager for Exchange do the snapshots of these RDM disks. VM rash consistency snapshots could only be made on "only NFS datastores" on our side and there was no way of making it with RDM disks. On the other hand, once we created the virtual machine snapshot (with one NFS disk and another RDM) we can only recover teh first disk, not the entire VM. What we did to have a complete copy of these mixed disk VM, was to schedule a NetApp snapshot at the same time that we programed the SMVI snapshot. In this way, if somehow the VM files get corrupteed, we could recover them through NetApp's snapshot and if the hard drive of the VM gets corrupted, we could recover it through the SMVI hard disk snapshot.

We also encountered a problem with snapdrive and a w2k3 machine. What we did to resolve this issue was to update the patches of the w2k3 machine, reboot it and then SMVI started working again. This VM was the vcenter wich has SnapDrive installed to mount SnapManager for Exchange's snapshots to do a single recovery mailbox.

We also treid to disable VSS but if I believe if you disable it, the SMVI will not make a crash consistency snapshot.

Hope this could be of your help.

Best regards,

Miguel

m_schuren
11,215 Views

Hi everybody,

Interesting thread.

I am seeing this problem for a long time now, and effects and error messages depend very much on the VMware Tools Version, as well as the installed application on the guest OS.

To summarize, my idea is:

- Have VSS-assisted quiesced OS Disk (VMDK) in all SMVI backups

- Have VSS-assisted quiesced DB/Log Disks (RDMs) in all SME/SMSQL/SMMOSS/SMO backups

In real life it turns out that the combination of both does not work out in many scenarios, and you get forced to disable both VSS AND sync driver within the problematic VM. So you lose "guaranteed consistency" for the OS disk.

The VMware Tools VSS driver (depends on version) seems to interfere with DataONTAP VSS provider (also depends on version) when there are some specific VSS writers (applications like exchange - also depends on version) are installed. Some combinations seem to work with MS-iSCSI, some seem to work with iSCSI or FC RDMs....

However, the main question (to me) is: Why does the NetApp VSS provider (which is needed for backing up the APP-data RDM drives) kick in at all, when VMware tools VSS driver tries to quiesce the OS drive (which is NOT on a RDM at all)? It seems to be problem in the way VSS is designed. VMware should not mess with DataONTAP VSS provider, but use the MS software proivder for the OS quiescing stuff, right?

Since I cannot answer or solve this question, I do not see any alternative to disabling both VSS AND Sync Driver in the guests that have these problems. This leaves the OS drive's snapshot in a non-flushed, "crash-consistent" state, which is of course better than nothing. I think in real life most people can live with it. I've personally not seen such an "inconsistent" Windows OS image that refused to boot after restoring it. NTFS is a journalling filesystem, hey, that's what it's there for

So, as long as the application data's backups are fully consistent (and they are!), this seems to be not a big problem in real life.

But of course there is one issue: it's hard to explain to customers that "your application data is guaranteed consistent, but your OS drive is not really - but it works" 😉

Just my 2 cents.

hector_yuen
11,215 Views

Hello,

How do you disable the VSS provider AND/OR Sync Driver? Would you have to disable the VSS service? is there a way to call VMWare snapshots without having them snap every single volume they see? just snapping the volumes that are on VMWare storage?

Thanks

Hector

thomas_glodde
6,134 Views

Hector,

you have to reinstall the VMware Tools and select Modify during setup, then disable the VSS & Sync driver during installation. After a reboot, both are gone then.

Not sure about your VMware snapshot question tho, what do you want to snap exactly?

Kind regards

Thomas

Public