I am quite new to the Netapp world, but still very happy for the gear 🙂
We have a metro cluster with 2 FAS 3160 boxes, and on top of that 20 blades running Vmware ESX, and a total of about 220 virtual machines.
We really like the SMVI product, it is very nice to do backups in under 1 hour..
BUT, we havent yet made a complette error-free backup, everyday we se some machines not being backed up becauce of either a "Operation timed out" or a "Creating a quiesced snapshot failed because the create snapshot opreration exceeded the time linit for holding off I/O in the frozen virtual machine"..
Until now we have:
The latest host utils, 5_0R2 on all ESX boxes
The lastest ESX version, update 4, 3.5.0 153875
The latest vmtools on all vm machines
The latest vCenter, 2.5.0 u4
Set the disk time out reg.key on all the vm's to 120 sec., although I have seen some mention that it should be 190 secs?
Which Guest OS(es) are you running? I have seen this issue many times on Windows 2003 (all versions, including R2, Standard, Enterprise, etc). Even though you are only using VMDKs, therefore not using SnapDrive, I still recommend you install several Microsoft hotfixes that are required for SnapDrive.
SMVI calls upon the VSS stack in Windows, to create a quiesced snapshot. A lot of these hotfixes are to improve VSS. I found even newer hotfixes that helped reduce the number of quiesce errors even more. However, they have not disappeared for me completely as well. I've also opened several cases with NetApp, but they cannot help me completely. It seems to be a combined NetApp/Microsoft issue. Right now, I install these hotfixes on all of my Windows 2003 machines:
All of these hotfixes are available for x86 and x64 architectures. Let me know if this helps.
One more thing: are you using the StorPort driver for the LSI SCSI card in the VM? That should also help. The old type of driver is ScsiPort, and is called "symmpi.sys". The StorPort driver is "lsi_scsi.sys" (check in Device Manager).
So maybe we have to use SMVI for most of the servers, and another solution for the more transaction critical servers.
-yes, I am aware of Snap Manager for SQL and Sharepoint, which we also have, but not yet has had that great succes with, this problem we have asked our vendor to help solve. It mainly goes with the RDM disks in either virtual mode or physical mode, and problems with Vmwares vmotion
I will digg deeper into your information, do some reboots, and come back with the results.
Have you heard anything about what new will be in SMVI version 2.0, and when it will be released?