Tech ONTAP Blogs

OpenShift Virtualization Disaster Recovery with NetApp Trident Protect

Rahul-Rana
NetApp
2,873 Views

Written by @LuisRico  and @Rahul-Rana

 

OpenShift Virtualization Disaster Recovery with NetApp Trident protect

 


Introduction

 

Red Hat OpenShift Virtualization 

Red Hat OpenShift Virtualization provides a virtualization platform on top of OpenShift to run and manage Windows and Linux Virtual Machines (VMs) alongside containers, by using Kubernetes custom resources to enable virtualization common tasks. 

It’s based on the open-source project KubeVirt that has become one of the most popular and active projects of the CNCF (Cloud Native Computing Foundation). 

Although the KubeVirt project started in 2017, it has only gained real popularity as an alternative virtualization platform in recent years. 

For running VMs in OpenShift Virtualization in production, the worker nodes of the OpenShift cluster must be bare metal servers, to avoid nested virtualization. Despite that fact you can run it on premises and on the main hyperscalers.  

 

Kubernetes lacks Disaster Recovery 

As OpenShift Virtualization runs on Kubernetes platform it intrinsically inherits the limitations of Kubernetes and OpenShift to usual practices performed by virtualization administrators. Kubernetes lacks Disaster Recovery (DR) tools, and so does OpenShift Virtualization.  

 

VMware SRM gap 

For VMware vSphere users, used to different DR applications included VMware Site Recovery Manager (SRM), it's a must to have a data protection tool with Disaster Recovery capabilities. For protecting hundreds or thousands of Virtual Machines with their VM disks, the only feasible option is to use some kind of storage replication to optimize the data transfer between sites. 

 

RTO and RPO requirements 

In the virtualization world, it's also usual to comply with strict Service Level Agreements (SLAs) on availability for the VMs, reflected on low RPO (Recovery Point Objective) and RTO (Recovery Time Objective). That forces the use of fast and efficient storage replication to transfer data as fast and frequent as possible for low RPO, and to automate the failover to reduce the time the VMs get started in the secondary/destination site for low RTO. 

 

NetApp with Trident Protect, that is available for NetApp customers at no additional cost with NetApp Trident, helps to solve this gap. 

 

Requirements 

Virtual Machine running on source OCP Cluster: 

RahulRana_0-1747376598098.png

 

 

ONTAP - Storage backends must be peered as mentioned in our documentation. 

AppVault – AppVault (Object Storage Bucket) is used to store metadata (k8s resources) for the Virtual Machine and associated namespace used during failover operations. Recommendation is to create two separate AppVault configurations for your source and destination sites. 

 

Source Cluster Setup: 

 

Cluster AppVault -  Verify that the AppVault CR shared between the source and destination has been created, following the example provided below. 

oc create secret generic <secret-name> \ 
--from-literal=accessKeyID=<objectstorage-accesskey> \ 
--from-literal=secretAccessKey=<ontap-s3-trident-protect-src-bucket-secret> \ 
-n trident-protect 

 

apiVersion: protect.trident.netapp.io/v1 
kind: AppVault 
metadata: 
  name: ontap-s3-trident-protect-src-bucket 
  namespace: trident-protect 
spec: 
  dataMoverPasswordSecretRef: my-optional-data-mover-secret 
  providerType: OntapS3 
  providerConfig: 
    s3: 
      bucketName: trident-protect-src-bucket 
      endpoint: s3.example.com 
      proxyURL: http://10.1.1.1:3128 
  providerCredentials: 
    accessKeyID: 
      valueFromSecret: 
        key: accessKeyID 
        name: s3-secret 
    secretAccessKey: 
      valueFromSecret: 
        key: secretAccessKey 
        name: s3-secret 

 

Create AppVault CR using command below: 

oc apply –f example-file.yaml 

 

Application CR - The Application CR is a Kubernetes object that allows Trident Protect to discover and manage user provided namespace or multiple namespaces for data protection operations, such as snapshots, backups, restores, or replication. 

 

 Define CR for your source application using example below: 

apiVersion: protect.trident.netapp.io/v1 
kind: Application 
metadata: 
  annotations: 
    protect.trident.netapp.io/skip-vm-freeze: "false" 
  name: demo-vm 
  namespace: source-vm-ns 
spec: 
  includedNamespaces: 
    - namespace: source-vm-ns 
      labelSelector: 
        matchLabels: 
          app: demo-vm 
  includedClusterScopedResources: 
    - groupVersionKind: 
        group: rbac.authorization.k8s.io 
        kind: ClusterRole 
        version: v1 
      labelSelector: 
        matchLabels: 
          mylabel: test

 

Create App CR using command below: 

oc apply –f example-file.yaml

 

Snapshot Schedule CR - Snapshot CR is used to specify the frequency (e.g., hourly) and retention (e.g., keep 24 snapshots) for data protection. These snapshots are then used for App Replication from source to destination OCP cluster. 

 

Snapshot Schedule - Schedule for snapshots (CR) using example below: 

apiVersion: protect.trident.netapp.io/v1 
kind: Schedule 
metadata: 
  name: snapshot-schedule 
  namespace: source-vm-ns  
spec: 
  appVaultRef: source-bucket 
  applicationRef: source-vm-ns 
  backupRetention: "0" 
  enabled: true 
  granularity: custom 
  recurrenceRule: |- 
    DTSTART:20220101T000200Z 
    RRULE:FREQ=MINUTELY;INTERVAL=5 
  snapshotRetention: "5" 

 

Create SnapshotSchedule CR using command below: 

oc apply –f example-file.yaml 

 

 

Destination Cluster Setup: 

Destination Cluster AppVault CR: Create a common  AppVault between source and destination so that destination cluster can access the application metadata created on the source cluster. Also create a destination AppVault similar to what was done on the source cluster.  

 

Configuring AMR 
 

AppMirrorRelationship CR: The AppMirror CR in NetApp Trident Protect is used to manage the replication frequency of applications across OCP clusters for disaster recovery and application mobility, leveraging NetApp’s SnapMirror technology. It defines the replication relationship between a source application and a destination, enabling failover, failback, or workload migration between OCP clusters. 

Define the application mirror relationship with a replication schedule. 

 

Example of AMR CR:  

apiVersion: protect.trident.netapp.io/v1 
kind: AppMirrorRelationship 
metadata: 
  name: amr-demo 
  namespace: destination-vm-ns 
spec: 
  desiredState: Established 
  destinationAppVaultRef: destination-bucket 
  namespaceMapping: 
    - destination:  destination-vm-ns 
      source:  source-vm-ns 
  recurrenceRule: |- 
    DTSTART:20220101T000200Z 
    RRULE:FREQ=MINUTELY;INTERVAL=5 
  sourceAppVaultRef: source-bucket-appvault 
  sourceApplicationName:  source-vm-ns 
  sourceApplicationUID: 7498d32c-328e-4ddd-9029-122540866aeb 
  storageClassName: ontap-sc 

 

Create AMR CR using command below: 

oc apply –f example-file.yaml 

 

As we can see below our AMR has been established and Region B namespace only contains the PVC 

 

RahulRana_1-1747376598098.png

 

Production Site Region A and Standby Site Region B 

 

RahulRana_2-1747376598098.png

 

 

 

Testing Disaster Recovery - Failover without affecting the source region

 

In this section we perform disaster recovery without impacting the our production site Region A. Towards the end we will bring down the VMs running in Region B without affecting our original site which is Region A.  In a failover scenario, the standby system or environment takes over the responsibilities of the primary system or environment, ensuring business continuity.

 

Failover without affecting the source region. 

 

RahulRana_0-1747417966255.png

 

 

 

Execute Failover on Region B: 

oc patch amr -n destination-vm-ns amr-demo --type='json' -p '[{"op": "replace", "path": "/spec/desiredState", "value":"Promoted"}]' 

 

You can observe AMR status change from promoting to promoted 

oc get amr amr-demo -n destination-vm-ns -w 

 

VM is up and running on Region B after failover 

RahulRana_4-1747376598098.png

 

 

RahulRana_5-1747376598098.png

 

 

Execute command below to discard changes on Region B cluster and re-establish mirror relationship from Region A to Region B: 

oc patch amr -n destination-vm-ns amr-demo --type='json' -p '[{"op": "replace", "path": "/spec/desiredState", "value":"Established"}]' 

 

Ensure AMR state change has changed from establishing to established 

oc get amr amr-demo –n destination-vm-ns -w 

 

We can see below AMR is back in Established state and VM has been Terminated 

 

RahulRana_6-1747376598098.png

 

 

Swapping the Regions - Migration of Virtual Machines


In this section Region B becomes the source region, and the Region A becomes the destination. 
Modifications to Region B Virtual Machines during failover are preserved.

 

Swapping Production and Standby Site

 

RahulRana_7-1747376598099.png

 

Prerequisite to this section of Reverse resync would be a failed over replication relationship as outlined above.

 

Delete AMR CR on Region B

oc delete amr amr-demo -n destination-vm-ns 

 

Capture changes since failover: Create a new base snapshot on Region B using example snapshot CR below: 

apiVersion: protect.trident.netapp.io/v1 
kind: Snapshot 
metadata: 
  namespace: destination-vm-ns 
  name: snapshot-cr 
spec: 
  applicationRef: destination-vm-ns 
  appVaultRef: Destination-bucket 
  reclaimPolicy: Delete 

 
Create Snapshot CR using command below: 

oc apply –f example-file.yaml 

 

Create snapshot schedule: Create a new snapshot schedule CR on Region B. 

apiVersion: protect.trident.netapp.io/v1 
kind: Schedule 
metadata: 
  name: snapshot-schedule 
  namespace: destination-vm-ns 
spec: 
  appVaultRef: destination-bucket 
  applicationRef: destination-vm-ns 
  backupRetention: "0" 
  enabled: true 
  granularity: custom 
  recurrenceRule: |- 
    DTSTART:20220101T000200Z 
    RRULE:FREQ=MINUTELY;INTERVAL=5 
  snapshotRetention: "5" 

 

Create SnapshotSchedule CR using command below: 

oc apply –f example-file.yaml 

 

Delete snapshot schedule on Region A 

oc delete schedule vm-snap-schedule -n source-vm-ns 

 

 Create new AMR CR: Establish a new AppMirrorRelationship CR on Region A. 

  • Ensure namespace mapping is accurate 
  • Ensure AppVaults have been swapped if using a source and destination app vault (destination will become source and source will become destination) 
  • Ensure srcApplicationName matches the name of the Application CR created on secondary instance 
  • Ensure srcApplicationUID matches the .metadata.uid from the Application CR created on the secondary instance

Example of AMR CR to be created on Region B OCP Cluster which can be created using: 

apiVersion: protect.trident.netapp.io/v1 
kind: AppMirrorRelationship 
metadata: 
  name: amr-demo 
  namespace: destination-vm-ns 
spec: 
  desiredState: Established 
  destinationAppVaultRef:  destination-bucket 
  namespaceMapping: 
    - destination:  destination-vm-ns 
      source:  source-vm-ns 
  recurrenceRule: |- 
    DTSTART:20220101T000200Z 
    RRULE:FREQ=MINUTELY;INTERVAL=5 
  sourceAppVaultRef: source-bucket-appvault 
  sourceApplicationName:  source-vm-ns 
  sourceApplicationUID: 7498d32c-328e-4ddd-9029-122540866aeb 
  storageClassName: ontap-sc 

 

Create AMR CR using command below: 

oc apply –f example-file.yaml 

 

Wait for AMR establishment: Wait for the AppMirrorRelationship to reach the "Established" state in Region A. 

oc get amr amr-demo -n source-vm-ns -w  

 

We can see below AMR is in the Established state in Region A 

 

RahulRana_8-1747376598099.png

 

Note: If VM was in running state it would be torn down as part of Establishing the AMR 

 

Region B is now the primary site with the VM operational replicating to Region A. 

 

RahulRana_9-1747376598099.png

 

 

RahulRana_10-1747376598099.png

 

Complete Disaster Recovery - Failover and Failback


In this scenario we revert to the original replication direction and state, we first replicate (resynchronize) any application changes to the Region A Virtual Machines prior to reversing the replication direction.


Syncing changes back to original Region A and bringing the App down on Region B 

 

RahulRana_0-1747418803014.png

 

 

Reversing the replication direction from Region B to Region A 

 

RahulRana_0-1747418481756.png

 

 

Prerequisite to this section would be Reverse resync a failed over replication relationship as outlined above.

 

Disable Schedules on Region B - Delete any snapshot schedules in Region B.

oc delete schedule snapshot-schedule -n destination-vm-ns 

 

ShutdownSnapshot CR: Create a ShutdownSnapshot CR on Region B to take a final snapshot and gracefully shutdown your application.  

 

Configure the following attributes for ShutDown snapshot CR: 

Region B AppVaultRef: (Required) This value must match the metadata.name field of the AppVault for the source application 

Region B ApplicationRef: (Required) This value must match the metadata.name field of the source application CR file.

 

Example of ShutdownSnapshot CR: 

apiVersion: protect.trident.netapp.io/v1 
kind: ShutdownSnapshot 
metadata: 
  name: replication-shutdown-snapshot 
  namespace: destination-vm-ns 
spec: 
  appVaultRef: destination-bucket 
  applicationRef: destination-vm-ns 

 

Create ShutdownSnapshot CR using command below: 

oc apply –f example-file.yaml 

 

On the source cluster, after the shutdown snapshot completes, get the status of the shutdown snapshot: 

oc get shutdownsnapshot -n source-vm-ns <shutdown_snapshot_name> -o yaml 

 

Shutdown snapshot created and VM resources have cleaned up in Region B

RahulRana_13-1747376598099.png

 

 After the ShutdownSnapshot has completed, get the name of the snapshot from the CR status as mentioned in our documention. 

 

On the source cluster, find the value of shutdownsnapshot.status.appArchivePath using the command below, and record the last part of the file path (also called the basename; this will be everything after the last slash): 

oc get shutdownsnapshot -n source-vm-ns <shutdown_snapshot_name> -o jsonpath='{.status.appArchivePath}' 

 

Perform a Failover using the snapshot basename in apparchive path retrieved using previous step from Region B to Region A. 

 

Follow Reverse Resync steps from Region A to Region B. 

 

Enable snapshot schedules on your original site Region A as described in prerequisites section of the blog. 

 

At this point you have completed the full cycle of failover and failback. VMs are again running in Region A replicating to Region B. 

 

RahulRana_14-1747376598098.png

 

Conclusion 
In conclusion, the integration of OpenShift Virtualization, NetApp Trident, and AppMirror provides a powerful solution for protecting and mirroring critical applications. By leveraging this integrated solution, organizations can unlock enterprise-grade virtualization, simplify management, and improve data protection. Whether you're a seasoned IT professional or just starting to explore the world of virtualization, this integrated solution is definitely worth considering. 

 

 

Public