TR-4128 vSphere 5 on MetroCluster configuration recommendations

lukedudney · ‎2014-11-19

Hi all,

I'm currently doing a design for a customer utilising a two NetApp MetroCluster pairs running 7-mode, and VMware vSphere 5.5 Enterprise Plus. One of the resources I've been referencing is NetApp's TR-4128 vSphere 5 on NetApp MetroCluster Solution. The document is excellent, especially around providing test plans and expected failure scenarios, but there are a couple of recommendations that apparently go against my understanding of best practice.

Admission Control

On page 19,

For the Admission Control option, select “Disable” since the solution's main goal is maximum availability rather than performance in the case of host failure.

This seems like a non-sequitor. A large part of providing availability for an application is guaranteeing minimum performance levels, typically dictated by an SLA. In a stretched metro storage cluster environment, with half of your storage and half of your hosts on each site, the solution must provide minimum performance even in the case of losing of 50% of both your compute and storage resources.

In the context of a vSphere HA cluster, enforcing minimum performance levels for your applications is done at the VM level through resource reservations. Guess what will happen in an HA event, where you’re reduced to 50% capacity, but your VMs have reservations configured that cannot be satisfied by the remaining hosts? The HA placement request will fail at the alive site and HA will fail to restart them, completely defeating the point of your expensive MetroCluster solution.

Admission Control is designed to prevent exactly this sort of thing from happening. In normal operation, even running at 100% capacity, it will prevent you powering on more VMs that can be supported in the event of the failure. The configuration should be:

Admission Control: Enable: Disallow VM power on operations that violate availability constraints.

Admission Control Policy: Percentage of cluster resources reserved as failover capacity:

CPU: 50%
Memory: 50%

This is also the recommendation from the VMware publication, VMware vSphere Metro Storage Cluster Case Study

On page 10,

Further, because such hosts are equally divided across the two sites, and to ensure that all workloads can be restarted by vSphere HA, configuring the admission control policy to 50 percent is advised

In the case where you're not using reservations, you should configure reasonable ball-park figures for das.vmCpuMinMhz and das.vmMemoryMinMB to ensure that admission control stops you deploying more VMs that you can service during a failover scenario.

Host Isolation Response

On page 20,

In iSCSI/NFS environments in which the management network correlates with the IP storage network, it is impossible for hosts to decide whether it is fully isolated. In these environments, it is better to change the setting to “Shutdown,” which will gracefully shut down the VMs whenever there is an isolation response. This avoids split-brain scenarios too.

Using the IP addresses of the array as isolation addresses means that, when the host triggers its HA response, it knows that it cannot reach its datastores. In this case, the VMs cannot write to their disks and thus cannot gracefully shut down or flush their dirty write buffers. Using the “Shutdown” isolation response will only delay the shutdown – the appropriate response in an IP storage environment is “Power Off”.

I would love to get some input from others who have designed similar solutions, clarification on these vSphere HA configurations especially as they relate to MetroCluster.

Thanks in advance!

lukedudney · ‎2015-01-20

Bumping my own post here but I've found another misleading statement in the document:

On page 36,

The controller failover on demand needs to be initiated within 30 minutes of the site failure because vSphere HA will stop trying to restart the virtual machines at a failed site after 30 minutes.

This is absolutely not the case. When the storage returns, HA will start trying to restart the VMs at the failed site. Duncan Epping (Chief Technologist at VMware, author of "vSphere Clustering Deep Dive" ie. a guy that would know) covers this in a blog post from 2012:

By default HA will try to restart a VM up to 5 times in roughly 30 minutes... HA manages a 'compatibility list'... as soon as the problem is resolved, and reported as such, the compatibility list will be updated. When the list is updated HA will continue with the restarts again.

I've also confirmed this in my lab and the VM was automatically and successfully restarted when the storage came back over 90 minutes after the initial failure.