BlueXP disaster recovery: How it protects your VMware virtual machines

ansley_tj · ‎2024-04-11

Hi everyone!

In my previous post, I introduced our new BlueXP™ service called BlueXP disaster recovery (DRaaS). In that post, I described the new service and how it can protect VMware VMs and their associated data to rapidly recover from a production site failure by leveraging native ONTAP SnapMirror® replication technology and integrating with vCenter to reconfigure and restart those VMs on a secondary vCenter in the event of a disaster.

In this entry, I want to review the various components that make up the DRaaS service and how those components are used to protect your virtualized datacenter.

DRaaS components

There are two primary constructs that DRaaS uses to simplify the process of managing disaster recovery protection. Let's look at each:

Resource group

A resource group is a logical container for managed objects or applications.

The resource group allows us to do two things:

Group VMs together and apply a replication plan to the entire group rather than managing individual replication plans for each individual object.
Provide a VM restart order to ensure that VMs that are dependent on other VMs are started after those supporting VMs.

Resource groups can contain from one to 100 VMs.

Replication plan

A replication plan is the set of instructions that DRaaS uses to protect the resource group and to recover the VMs in the resource group when a production site failure occurs.

In this initial release of DRaaS, the objects we place in resource groups and replicate are VMware VMs.

When you think of a resource group think of it as an "application" in that the resource group contains all the VMs that are required to provide a production application service. For example, in a complex point-of-sale solution, this may be all the VMs that run the various databases, the VMs running the business logic, and the VMs running the front-end presentation layer. We would want them to be managed from a DR perspective as a single “application” with a single protection and failover policy.

So, how do we use these constructs to protect our VMs?

Protecting VMs using BlueXP disaster recovery

So, lets now take a quick look at how resource groups and replication plans work together to protect your VM infrastructure.

As we just defined above, a resource group is simply a container to hold the objects we want to protect…in this case those objects are VMware VMs hosted in a vCenter SDDC. Each VM has a series of files that contain the current definition and state of the VM. These state files are stored in a vCenter datastore. VMware datastores can be one of four different types: VMFS, NFS, vSAN, and vVol. For the initial release of DRaaS, NFS datastores are supported.

Since each VM is associated with a datastore, DRaaS uses that to identify the ONTAP NFS volume that is hosting that datastore. Thus, simply by providing DRaaS a set of VMs, contained in a resource group, it can identify the ONTAP volumes that need to be protected.

To protect the resource group, we link it to a replication plan. As we defined earlier, the replication plan is a set of instructions. Specifically, the replication plan does the following:

Identifies the vCenter and ONTAP array to use as a replication target.
Identifies the ONTAP SVM to host the DR ONTAP volumes
Identifies the virtual network in the DR vCenter SDDC to connect the VM’s vnics to
Defines for each VM (or all VMs) the network configuration changes that can include:
- Whether the VMs are assigned static or DHCP IP addresses
- The subnet
- The gateway address
- The DNS server address
Defines for each VM ( or all VMs) whether a post-failover guest OS script is to be run
How often to create replicated snapshots of the resource group VMs. This is essentially your recovery point objective.
How many snapshots to retain at the DR site.

Note that a resource group can be part of multiple replication plans and a replication plan can be linked to multiple resource groups. The former use case would indicate what we call at NetApp a “fan-out” scenario, where we might want a second backup of the resource group. The latter use case would be used if we have multiple sets of VMs that need to be protected in the same manner and failed over at the same time in the event of a production site failure.

Once the replication plan is associated with the resource group, you simply tell DRaaS to enable the protection and that’s it…DRaaS takes over and performs the necessary steps to protect the VMs in the resource group. To do this, it must create any volumes on the DR ONTAP cluster, setup the required SnapMirror relationships between the source ONTAP cluster and the destination DR ONTAP cluster. These SnapMirror relationships enable ONTAP to replicate the NFS volume’s data to the destination based on the native point-in-time Snapshot® Copy process within ONTAP. The replication plan defines how often to perform these snapshots and replicate them to the DR cluster.

The only thing to do now is to wait for – and hoping never to experience – a failover event to happen. If or when that happens, initiating a failover is a user-initiated process. SnapMirror will not automatically initiate the failover, but for the user to do so, is as simple as selecting “Fail over” from the replication plan’s menu:

What happens with we have to fail over to the DR site?

Once you click the Fail over menu item, DRaaS initiates the failover process. This process includes the following steps:

Perform a compliance check to verify that the DR vCenter and ONTAP clusters are active and ready to be activated
Allow the user to select the snapshot from those located on the DR cluster to recover from. This can be the latest snapshot or a previous snapshot. The latter case might be useful in the case of a ransomware attack where you can restore from a snapshot prior to the intrusion of the ransomware.
Break all SnapMirror relationship to activate the DR ONTAP cluster volumes
Restore from the selected snapshot
Register the DR volumes as vCenter datastores within the DR vCenter cluster
Reconfigure each VM based on the replication plan’s settings
Start the VMs in the defined order and with the defined per-VM delays as defined in the replication plan
Execute any guest OS scripts required for each VM as defined in the replication plan

Looking at those steps, think about how difficult that would be to do manually…especially the VM reconfigurations! With a single click of the mouse, DRaaS can rapidly restart any set of VMs resulting in a more rapid recovery from a production failure event.

Final thoughts

In this blog, I described the major components of the BlueXP disaster recovery architecture and how those components are used to protect your VMware VM infrastructure.

Accessing DRaaS is simple. It is available on any BlueXP account today. Simply go to your BlueXP account and select Protection -> Backup and recovery.

We are also offering a 90-day, unlimited free trial, so you can give it a test run within your environment. I encourage you to use this opportunity and see how BlueXP disaster recovery can help you better protect your VMware infrastructure.

If you want to learn more, check out the following resources:

BlueXP DR Blog Index: