Our team is very interested in the questions posted on SMVI functionality, configuration, and automation … thanks for taking the time to share. And please keep the questions and feedback on your experiences working with SMVI, good or bad, coming. Good feedback makes for a good day, bad feedback makes for a better product J.
I also wanted to let you know that NetApp is hosting an SMVI/SRM Webcast on February 19 focusing on data protection in a VMware environment: NetApp SMVI for backup/restore and VMware SRM for disaster recovery. SDDPC Sys Admin Rick Scherer - who designed and maintains a 25 host VMware ESX 3.5 farm with well over 300 Virtual Machines, plus writes a great blog (http://vmwaretips.com/wp/) - will be joining us to describe how his team uses SMVI. There will also be a panel of folks, including best practices authors and reference architects, to address questions submitted via chat.
SMVI Product Manager
I am in the UK and it will be late in the evening here when this event starts. Will the streamed contents be recorded and available via the now site, after the event?
We are currently using OSSV to backup our VMs but will be changing to SMVI soon and it will be good to know about some of the "holes in the road" before we hit them?
we run smvi 1.0.1 for several weeks now and it is pretty stable (compared to 1.0!). last night, 2 of our daily backup jobs started off - and they still are in running state, producing event 4096 in the application log like this:
1st an Error:
2936921 [backup2 6778732c8323002d813f32f6dab0368e] ERROR com.netapp.common.flow.JDBCPersistenceManager - FLOW-10110: Lock "50228a39-7eba-0c39-5d7c-bb5b34be305f" already held by backup-create operation 9a80d4f1bc7f39fa6e4cabb0559a097a 
2nd a warning:
2936921 [backup2 6778732c8323002d813f32f6dab0368e] WARN com.netapp.smvi.task.AcquireVirtualMachineLockWrapper - Could not lock all virtual machines. The lock process will continue to be retried until it succeeds.
During yesterday afternoonn I was trying to restore a Test-VM which failed! Coud that maybe disturb the backup jobs?
I cannot stop the 2 backup jobs - have restarted the vc server (where smvi is also installed) - nothing helps. I cannot start new ones - so we are somewhat looked up!
Hope anyone can give me a hint how to stop this...
Thanks in advance
I get an error: "Solution does not exist in this knowledge base" when clicking the link!
MCSE / CCEA / VCP
Tel +41 32 387 82 19
Fax +41 32 387 81 11
Besuche Sie uns im Internet unter: www.in4u.ch
Visitez nous sur Internet: www.in4u.ch
Visit our homepage: www.in4u.ch
Oh - we found out, why the scheduled Jobs were running all night long until 30 minutes ago! One of the volumes on the DR filer was removed because the storage people were thinking, we don't need it anymore. But the SMVI jobs trigger a snapmirror update and that was failing all the time.
Again it would be very nice to get to know, how we could stop these running jobs. As a said - we tried reinstalling SMVI, clearing all sorts of temp files and rebooting the VC-Server several times. Those jobs just cannot be killed!
I'm glad you found the cause on your end.
You can clear out ALL running tests by using the following steps (from that KB article)
SnapManager for VI utilizes an internal database to keep track of these locks and provides persistence across reboots. Simply rebooting the SnapManager for VI host will not clear these locks.
If you want to remove all currently running tasks in SMVI, perform the following:
- Stop SnapManager for VI service.
- Remove the <SMVI dir>/server/crashdb directory.
- Start SnapManager for VI service.
Performing these steps will not affect the scheduled jobs nor remove them from the interface. It will kill and remove any outstanding or in process tasks.
It looks we are running into the same errors, although running SMVI 2.0 in stead of 1.x.
It starts with: "348727703 [backup1 4aee7659f7e1fae68fcd68caf4dc1038] WARN com.netapp.smvi.task.AcquireVirtualMachineLockWrapper - Could not lock all virtual machines. The lock process will continue to be retried until it succeeds."
Then we get: "348738986 [backup1 751dfe6bb3fc9a8b9d69c840d97002d8] ERROR com.netapp.common.flow.JDBCPersistenceManager - FLOW-10110: Lock "757d6ef9-eb32-4ced-9752-b8a9c0f4f112" already held by backup operation 2a100a770e2071475436c338d2586ba3 "
Several jobs keep in a running state and the Snapmirrors are not updated. VMware snapshots initiated by SMVI are also not cleaned up correctly. The _recent_ Snapshot does get renamed but there is no new _recent_ Snapshot.
Although stopping the SMVI service and cleaning up the crashdb directory seems to help it does not really solve the problem. Having to do this every other day is not a real solution.
Any suggestions on how to solve this problem?
VMware snapshots can cause issues in certain environments. Should the datastore the virtual machines
reside upon experience heavy disk I/O, VMware snapshots can take a long time to create and may
eventually time out and fail. Although this occurs at a VMware level, SMVI depends upon the VMware
snapshots for a backup to complete successfully.
Should this issue be encountered during VMware snapshot creation, whether initiated manually through
vCenter or through SMVI, the administrator must reduce the number of concurrent VMware snaps, reduce
the amount of disk I/O, or eliminate the VMware snapshots from the SMVI backup process.
We also recommend
installing the latest VMware Tools on virtual machines to enable successful backups.
We also recommend that customers align VM’s according to
TR-3749 and TR 3747
I have a question/problem with regards to restores. I have successfuly backed up a test VM and then I deleted it from disk. When I attempt to restore it, I get the following error:
=== CLIENT ===
OS Name=Windows 2003
=== ERROR ===
=== MESSAGE ===
Error restoring backed up entity
=== DETAILS ===
Could not locate or create an initiator group on storage system "vanna01" for ESX server "192.168.16.65". Please ensure the ESX server has one or more initiators logged into the storage system.
=== CORRECTIVE ACTION ===
=== STACK TRACE ===
com.netapp.nmf.smvi.main.SmviErrorDetailException: Error restoring backed up entity
at java.awt.event.InvocationEvent.dispatch(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
The ESX host in question is listed in VC as "192.168.16.65"
The Initiator group is called ISCSI-VANVS05
The initator nodename is iqn.1998-01.com.vmware:vanvs05-5ae8d131
The initiator alias is vanvs05.van.davis.ca
The initiator is logged into the storage system
Is there something that is supposed to match up between the host/ip/name used to login via SSH in SMVI and the iGroup that I am not aware of?
I have a customer running SMVI 1.0.1, on a 6080, running 100's of VMs.
When we run a backup on a datastore holding 15 guests, SMVI creates 1 snapshot per machine, which makes it impossible for the cusotmer to keep the snapshots on the main site for more that a few days (255 max snaps / 15 = 15 days).
The customer got burned in the past with a virus attack that took a while to discover and want to retain the snapshots for more than 15 days - is it possible to make SMVI take 1 snapshot of all 15 machines (we are aware of the fact that all machines will need to be in Hot Backup mode and that this will affect performance for the duration of the backup window).
SE, Strategic Accounts,
It sounds like your customer is creating separate SMVI backups for each VM. For their case, I would suggest looking at creating a smaller number of backups with more VMs. They could try a single backup of just that datastore. SMVI will, by default, create VMware snapshots for every VM in the datastore, then it will create a single ONTAP snapshot for the datastore(s) involved.
Please be aware that if the VMs are experience heavy I/O, it is possible for one or more of the VMware snapshots to fail. In those cases, the VMs that did not complete their VMware snapshots will not be properly backed up. If this occurs, the two resolutions are to either a) disable VMware snapshots, b) Reduce the number of VMs per SMVI backup. If the choice is to reduce the number of SMVI backups, I would start by just dividing in half and testing.
Appreciate the quick response.pparently there was a misunderstanding between the customer and me:
He has 17 LUNs (300GB each, 500VMs total) inside one volume, and has 17 backup jobs (on the datastore level) running one after the other (he doesn't want to backup all the machines in one snapshot fearing that 400 quiesce requests will overload the servers, and that's why he's getting 17 snaps per backup.
He claims that our best practices for SMVI were the reason he chose the LUN size to be 300GB, and what lead to the need for 17 LUNs.I realise that we can ask them to move some of the LUNs to other volumes, but that would require a downtime for some of the servers.Is there another way around this issue ?
Any idea how we can work around this issue ?
That is a lot of LUNs in a single vol. There are probably a few ways to reorganize the customers data as long as they have the spare storage. Now, please be aware that I'm not the most qualified to give the exact steps on this. I would also suggest subscribing to dl-server-virtualization and ask this same basic question there if you do not hear anything else on this thread.
The root issue is that there are multiple VMFS datastores backed by LUNs all in the same volume. Finding the best solution to break this apart so that there is no downtime is the trick.
The basic way I can see resolving this for this customer is through using VMotion and/or Storage VMotion. They should be able to bring up another VMFS datastore backed by another LUN in another volume. Then use VMotion to move several VMs over to the new VMFS datastore. I believe you can perform VMotion without powering off a VM, but I believe it cannot have any VMware snapshots.
I personally am not well versed on Storage VMotion, so I cannot really give you any ideas on how well that would work for the customer.
How many VM's does he have per LUN? You will not want more than 10-15 VM's per LUN, because of SCSI Reservations. But if you need to move the LUN's to a new volume I would use storage vmotion (here is a plugin for Virtual Center which works a lot better than using command line). I would storage vMotion VM's to a different LUN then create the new Volume and LUN's then storage vMotion the VM's to the new LUN
I am having an issue where SMVI is leaving VMware snapshots that it creates when it should be deleting them. I receive this in the logs
2009-05-04 01:15:15,493 WARN - VMware Task "CreateSnapshot_Task" for entity "NYC-IIS05" failed with the following error: Operation timed out.
2009-05-04 01:15:15,493 ERROR - VM "NYC-IIS05" will not be backed up since vmware snapshot create operation failed.
But a snapshot is created and then is not removed after the backup finishes. I have case 2000733732 open about this issue.