VMware Solutions Discussions

VMs Powered Off Unexpectedly when Using SMVI

willgreen
3,786 Views

I recently started using SMVI on my NFS-based VMware Infrastructure set up (details of my system configuration are included below). Unfortunetly since starting to use SMVI I've been experiencing issue with VMs powering off unexpectedly. Though I should note it doesn't happen at the same time as the snapshot. The only relevant log entry I can see is in VMware Infrastructure Client (there's nothing in the NetApp syslog to indicate an issue):

Configuration file for vm-name cannot be found

This is followed shortly afterwards by

Virtual Machine vm-name is connected

Reviewing TR-3428*, section 14.3 it discusses a problem related to deleting VMware snapshots and a patch (ESX350-200808401-BG) for the problem. However, it then goes on to note:

"When this patch is in use, there is a condition where virtual machines running 3rd party virtual machine management agents may get powered off unexpectedly. In order to avoid this behavior, please consult the support organization of the management agent regarding virtual disk pooling interval tuning."

I haven't followed the steps in 14.3 to fully enable the patch, but am I right in thinking that even having ESX350-200808401-BG installed could lead to this problem?

Am I also right in thinking that HP CIM constitutes a 3rd party virtual machine management agent in this context? Or is this referring to SMVI itself?

Any clarification or experience on this would be much appreciated.

Summary of set up:

  • Active-active 3140 w/ ONTAP 7.3.1.1
  • SMVI 1.2
  • ESXi 3.5 u4 (176894)
  • NFS datastores
  • vCenter 2.5 u5
  • HP BL490 G6 blades

*NetApp and VMware Virtual Infrastructure 3 StorageBest Practices 4.5.2 (July 2009)

5 REPLIES 5

amiller_1
3,786 Views

Hmm....if you're running ESX 3.5U4, that patch is definitely in place (it was rolled up into 3.5U3 I believe.....definitely rolled up in 3.5U4).

Are you running the ESX Host Utilites? That optimizes your NFS settings as well as places the necessary config line in /etc/vmware/config (helps with snapshot issues in general on NFS datastores).

I'd probably try installing the ESX Host Utilities and go from there.

Poking around /var/log/vmware on the ESX service console might be helpful as well....would give you more verbose logs from the ESX perspective of what's going on.

willgreen
3,786 Views

Since my first message I've done the folloing:

  • Added the prefvmx.consolidateDeleteNFSLocks = "TRUE" line to the /etc/vmware/config file on all hosts and confirmed thatNFS.LockDIsable was set to 0.
  • Rebooted all of the hosts and restarted all of my VMs.
  • Checked that the Net and NFS settings as recommended in the TR were configured (they already were).

However, I can't see any lock files on my NFS datastore for any of my VMs. I presume it should be in the VM's main directory along with the .vmx and .nvram file etc.?

Sadly the host utilities don't currently support ESXi (only ESX).

eric_barlier
3,786 Views

Hi Will,

As ESX have no way of knowing where a LUN sits on a netapp controller SMVI sits in between NTAP controllers like a "translator" and maps ESXs datastore/VM info to NTAP controllers volume/LUN layout. This allows for backups/restores/ to occur. Thus I am not sure if your issue is SMVI related, but

its prudent to not rule it out at this point in time of course.

On another note we are running SMVI here and have no issues. I believe we are running 1.2 as well. What concerns me is that you are running a version of ESX that is not supported with ESX host utils. I would ask NTAP tech support if its OK to run SMVI without ESX host utils. or if there could be some impact

if its not installed.

Cheers,
Eric

willgreen
3,786 Views

The issue turned out to be caused by a network misconfiguration that occurred at the same time SMVI was enabled. SMVI has been running fine since that was corrected.

Thanks for the advice and apologies for not updating this question sooner.

Will

kenoakeson
3,786 Views

What was your networking issue that you corrected?  Thanks

Public