Tech ONTAP Blogs

Protect tier-1 applications and databases with VMware vSphere Metro Storage Cluster and ONTAP

bingen
NetApp

NetApp ONTAP 9.10.1 adds many new features to expand the performance and supported scale of the VMware vSphere Metro Storage Cluster (vMSC) solution on SnapMirror Business Continuity (SM-BC). I thought now would be fantastic time to revisit the solution and introduce you to the ONTAP 9.10.1 enhancements.

 

What’s new with SM-BC and ONTAP 9.10.1?

 

With the ONTAP 9.10.1 release, the number of consistency groups has increased from 5 to 20, each guaranteeing dependent write-order consistency for up to 16 volumes (an increase from 12 volumes previously). The number of SnapMirror Synchronous (SM-S) relationships available to SM-BC has also steadily increased, up to 60 in ONTAP 9.9.1 and raising again to 200 in ONTAP 9.10.1.

 

But there are other related improvements since the original 9.8 release that are also relevant to this discussion.

 

As you know, in ONTAP 9.8 we also introduced support for the maximum size VMFS6 datastore (64TB, ONTAP supports up to 128TB LUNs) with ONTAP All SAN Array (ASA) platforms, and since ONTAP 9.9.1, we’ve improved single LUN I/O performance dramatically—nearly 400% under some workloads compared to ONTAP 9.8 single LUN performance. So now you can safely deploy massive and highly performant datastores to service your largest VMs and protect them with SM-BC BY using vMSC.

 

Implementing vMSC with SM-BC

 

Let’s look at an example of vMSC using SM-BC deployment.

 

A quick note, all the steps are taken from the following documents:

 

And here are some additional references:

 

Some high-level notes:

  • 10ms RTT between replicas is required by both NetApp and VMware.
  • VMware states that storage I/O control (SIOC) metrics must be disabled, which means the reports available in NetApp ONTAP tools for VMware vSphere cannot display latency stats that depend on SIOC.

 

The configuration continues in the order that follows.

 

Prepare your ESXi cluster

 

To prepare your ESXi cluster, complete the following steps:

  1. Set the HA admission control with the cluster resource reserve; leave the default 50% CPU/MEM.

bingen_0-1636655769331.png

 

Add two isolation IPs that can ping, one per site. Do not use the gateway IP. The vSphere HA advanced setting used is das.isolationaddress. You can use ONTAP or Mediator IP addresses for this purpose.

Refer to: https://core.vmware.com/resource/vmware-vsphere-metro-storage-cluster-recommended-practices#sec2-sub5

 

Adding an advanced setting called das.heartbeatDsPerHost can increase the number of heartbeat datastores. Use four heartbeat datastores (HB DSs)—two per site. Use the “Select from List but Compliment” option. This is needed because if one site fails, you still need two HB DSs. However, those don’t have to be protected with SM-BC.

Refer to: https://core.vmware.com/resource/vmware-vsphere-metro-storage-cluster-recommended-practices#sec2-sub5

bingen_1-1636655769338.png

 

bingen_2-1636655769349.png

 

  1. Set VMCP to Power Off and restart for both permanent device loss (PDL) and all paths down (APD). For APD, select Conservative Policy.
  2. Leave the response recovery disabled.

bingen_3-1636655769360.png

 

  1. Make sure Disk.AutoremoveOnPDL is set to 1 on the ESXi host Advanced System Settings.

bingen_4-1636655769370.png

 

Configure NetApp software

 

To support the NetApp components, deploy and configure the following software:

  1. Deploy a Centos 7.6–7.9 or RHEL 7.6–7.9/8.0–8.4 based virtual machine (VM) to serve as the ONTAP Mediator host.
  2. Install the ONTAP Mediator and certificates (certs are optional) on the VM.
  3. Peer your ONTAP clusters and storage virtual machines (SVMs). We’ll be using the standard workflows here.
  4. Deploy and configure ONTAP tools. Again, we’ll be using the standard workflow here, so add both clusters.

 

Provision and protect a SAN datastore

 

To provision and protect a SAN datastore, complete the following steps:

  1. Using ONTAP tools, provision a SAN datastore on site A. This will ensure that all hosts in site A have the correct initiator mappings and that the LUN is created with the correct options for optimal use as a vSphere datastore.
  2. Use ONTAP System Manager to enable SM-BC. Simply go down to Protection, expand the menu, and select Relationships. Once there, click Protect, and select LUNs. There create a new or select existing consistency group and uncheck the enforce option to not conflict with any quality-of-service (QoS) policies set by ONTAP tools based on VM storage policies.

The following two screenshots illustrate the process.

bingen_5-1636655769378.png

 

bingen_6-1636655769391.png

 

  1. After SnapMirror is in-sync, map the replica LUN on the destination and make sure to use same LUN ID from site A then rescan storage on ESXi hosts after.
  • That which hosts you map the LUNs to depends on your host access topology. For a brief description of the differences between uniform host access and non-uniform host access deployment models, see the KB. There are certain pros and cons to both deployment models, but that’s beyond the scope of this post.

bingen_7-1636655769396.png

 

Now the datastore should be visible to all mapped hosts.

bingen_8-1636655769399.png

 

Notice now how you have double the number of paths, but the active (I/O) paths are the same from the node that owns the aggregate.

Before:

bingen_9-1636655769405.png

 

After:

bingen_10-1636655769412.png

 

Follow-up work to be done before going into production

 

The following tasks should be performed on both site A and site B. These tasks are in addition to the generally recommended best practice of using ONTAP tools to tune your ESXi host for recommended settings.

 

Consider the application dependencies and set the VM ordering appropriately. For example, a Microsoft Windows Active Directory Domain Controller should be started before a Microsoft SQL Server.

 

For example:

bingen_11-1636655769423.png

 

bingen_12-1636655769430.png

 

  1. Also, you’ll want to create host and VM groups to set site affinities.

bingen_13-1636655769441.png

 

  1. Set the rule to say Should and not Must.

bingen_14-1636655769447.png

 

  1. Datastores created by ONTAP tools do not have SIOC on by default, however any datastores created or edited manually might. In those instances, remember to turn it off.

Right-click the datastore and select Configure Storage I/O Control.

bingen_15-1636655769460.png

 

Make sure it is disabled.

  • This will affect ONTAP tools for VMware vSphere’s ability to collect storage stats.

bingen_16-1636655769465.png

 

That about wraps it up. You’re now ready to begin failover testing!

Public