Tech ONTAP Blogs
Tech ONTAP Blogs
Written by @LuisRico and @Rahul-Rana
OpenShift Virtualization Disaster Recovery with NetApp Trident protect
Red Hat OpenShift Virtualization provides a virtualization platform on top of OpenShift to run and manage Windows and Linux Virtual Machines (VMs) alongside containers, by using Kubernetes custom resources to enable virtualization common tasks.
It’s based on the open-source project KubeVirt that has become one of the most popular and active projects of the CNCF (Cloud Native Computing Foundation).
Although the KubeVirt project started in 2017, it has only gained real popularity as an alternative virtualization platform in recent years.
For running VMs in OpenShift Virtualization in production, the worker nodes of the OpenShift cluster must be bare metal servers, to avoid nested virtualization. Despite that fact you can run it on premises and on the main hyperscalers.
As OpenShift Virtualization runs on Kubernetes platform it intrinsically inherits the limitations of Kubernetes and OpenShift to usual practices performed by virtualization administrators. Kubernetes lacks Disaster Recovery (DR) tools, and so does OpenShift Virtualization.
For VMware vSphere users, used to different DR applications included VMware Site Recovery Manager (SRM), it's a must to have a data protection tool with Disaster Recovery capabilities. For protecting hundreds or thousands of Virtual Machines with their VM disks, the only feasible option is to use some kind of storage replication to optimize the data transfer between sites.
In the virtualization world, it's also usual to comply with strict Service Level Agreements (SLAs) on availability for the VMs, reflected on low RPO (Recovery Point Objective) and RTO (Recovery Time Objective). That forces the use of fast and efficient storage replication to transfer data as fast and frequent as possible for low RPO, and to automate the failover to reduce the time the VMs get started in the secondary/destination site for low RTO.
NetApp with Trident Protect, that is available for NetApp customers at no additional cost with NetApp Trident, helps to solve this gap.
Virtual Machine running on source OCP Cluster:
ONTAP - Storage backends must be peered as mentioned in our documentation.
AppVault – AppVault (Object Storage Bucket) is used to store metadata (k8s resources) for the Virtual Machine and associated namespace used during failover operations. Recommendation is to create two separate AppVault configurations for your source and destination sites.
Cluster AppVault - Verify that the AppVault CR shared between the source and destination has been created, following the example provided below.
oc create secret generic <secret-name> \
--from-literal=accessKeyID=<objectstorage-accesskey> \
--from-literal=secretAccessKey=<ontap-s3-trident-protect-src-bucket-secret> \
-n trident-protect
apiVersion: protect.trident.netapp.io/v1
kind: AppVault
metadata:
name: ontap-s3-trident-protect-src-bucket
namespace: trident-protect
spec:
dataMoverPasswordSecretRef: my-optional-data-mover-secret
providerType: OntapS3
providerConfig:
s3:
bucketName: trident-protect-src-bucket
endpoint: s3.example.com
proxyURL: http://10.1.1.1:3128
providerCredentials:
accessKeyID:
valueFromSecret:
key: accessKeyID
name: s3-secret
secretAccessKey:
valueFromSecret:
key: secretAccessKey
name: s3-secret
Create AppVault CR using command below:
oc apply –f example-file.yaml
Application CR - The Application CR is a Kubernetes object that allows Trident Protect to discover and manage user provided namespace or multiple namespaces for data protection operations, such as snapshots, backups, restores, or replication.
Define CR for your source application using example below:
apiVersion: protect.trident.netapp.io/v1
kind: Application
metadata:
annotations:
protect.trident.netapp.io/skip-vm-freeze: "false"
name: demo-vm
namespace: source-vm-ns
spec:
includedNamespaces:
- namespace: source-vm-ns
labelSelector:
matchLabels:
app: demo-vm
includedClusterScopedResources:
- groupVersionKind:
group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
labelSelector:
matchLabels:
mylabel: test
Create App CR using command below:
oc apply –f example-file.yaml
Snapshot Schedule CR - Snapshot CR is used to specify the frequency (e.g., hourly) and retention (e.g., keep 24 snapshots) for data protection. These snapshots are then used for App Replication from source to destination OCP cluster.
Snapshot Schedule - Schedule for snapshots (CR) using example below:
apiVersion: protect.trident.netapp.io/v1
kind: Schedule
metadata:
name: snapshot-schedule
namespace: source-vm-ns
spec:
appVaultRef: source-bucket
applicationRef: source-vm-ns
backupRetention: "0"
enabled: true
granularity: custom
recurrenceRule: |-
DTSTART:20220101T000200Z
RRULE:FREQ=MINUTELY;INTERVAL=5
snapshotRetention: "5"
Create SnapshotSchedule CR using command below:
oc apply –f example-file.yaml
Destination Cluster AppVault CR: Create a common AppVault between source and destination so that destination cluster can access the application metadata created on the source cluster. Also create a destination AppVault similar to what was done on the source cluster.
Configuring AMR
AppMirrorRelationship CR: The AppMirror CR in NetApp Trident Protect is used to manage the replication frequency of applications across OCP clusters for disaster recovery and application mobility, leveraging NetApp’s SnapMirror technology. It defines the replication relationship between a source application and a destination, enabling failover, failback, or workload migration between OCP clusters.
Define the application mirror relationship with a replication schedule.
Example of AMR CR:
apiVersion: protect.trident.netapp.io/v1
kind: AppMirrorRelationship
metadata:
name: amr-demo
namespace: destination-vm-ns
spec:
desiredState: Established
destinationAppVaultRef: destination-bucket
namespaceMapping:
- destination: destination-vm-ns
source: source-vm-ns
recurrenceRule: |-
DTSTART:20220101T000200Z
RRULE:FREQ=MINUTELY;INTERVAL=5
sourceAppVaultRef: source-bucket-appvault
sourceApplicationName: source-vm-ns
sourceApplicationUID: 7498d32c-328e-4ddd-9029-122540866aeb
storageClassName: ontap-sc
Create AMR CR using command below:
oc apply –f example-file.yaml
As we can see below our AMR has been established and Region B namespace only contains the PVC
Production Site Region A and Standby Site Region B
In this section we perform disaster recovery without impacting the our production site Region A. Towards the end we will bring down the VMs running in Region B without affecting our original site which is Region A. In a failover scenario, the standby system or environment takes over the responsibilities of the primary system or environment, ensuring business continuity.
Failover without affecting the source region.
Execute Failover on Region B:
oc patch amr -n destination-vm-ns amr-demo --type='json' -p '[{"op": "replace", "path": "/spec/desiredState", "value":"Promoted"}]'
You can observe AMR status change from promoting to promoted
oc get amr amr-demo -n destination-vm-ns -w
VM is up and running on Region B after failover
Execute command below to discard changes on Region B cluster and re-establish mirror relationship from Region A to Region B:
oc patch amr -n destination-vm-ns amr-demo --type='json' -p '[{"op": "replace", "path": "/spec/desiredState", "value":"Established"}]'
Ensure AMR state change has changed from establishing to established
oc get amr amr-demo –n destination-vm-ns -w
We can see below AMR is back in Established state and VM has been Terminated
In this section Region B becomes the source region, and the Region A becomes the destination. Modifications to Region B Virtual Machines during failover are preserved.
Swapping Production and Standby Site
Prerequisite to this section of Reverse resync would be a failed over replication relationship as outlined above.
Delete AMR CR on Region B
oc delete amr amr-demo -n destination-vm-ns
Capture changes since failover: Create a new base snapshot on Region B using example snapshot CR below:
apiVersion: protect.trident.netapp.io/v1
kind: Snapshot
metadata:
namespace: destination-vm-ns
name: snapshot-cr
spec:
applicationRef: destination-vm-ns
appVaultRef: Destination-bucket
reclaimPolicy: Delete
Create Snapshot CR using command below:
oc apply –f example-file.yaml
Create snapshot schedule: Create a new snapshot schedule CR on Region B.
apiVersion: protect.trident.netapp.io/v1
kind: Schedule
metadata:
name: snapshot-schedule
namespace: destination-vm-ns
spec:
appVaultRef: destination-bucket
applicationRef: destination-vm-ns
backupRetention: "0"
enabled: true
granularity: custom
recurrenceRule: |-
DTSTART:20220101T000200Z
RRULE:FREQ=MINUTELY;INTERVAL=5
snapshotRetention: "5"
Create SnapshotSchedule CR using command below:
oc apply –f example-file.yaml
Delete snapshot schedule on Region A
oc delete schedule vm-snap-schedule -n source-vm-ns
Create new AMR CR: Establish a new AppMirrorRelationship CR on Region A.
Example of AMR CR to be created on Region B OCP Cluster which can be created using:
apiVersion: protect.trident.netapp.io/v1
kind: AppMirrorRelationship
metadata:
name: amr-demo
namespace: destination-vm-ns
spec:
desiredState: Established
destinationAppVaultRef: destination-bucket
namespaceMapping:
- destination: destination-vm-ns
source: source-vm-ns
recurrenceRule: |-
DTSTART:20220101T000200Z
RRULE:FREQ=MINUTELY;INTERVAL=5
sourceAppVaultRef: source-bucket-appvault
sourceApplicationName: source-vm-ns
sourceApplicationUID: 7498d32c-328e-4ddd-9029-122540866aeb
storageClassName: ontap-sc
Create AMR CR using command below:
oc apply –f example-file.yaml
Wait for AMR establishment: Wait for the AppMirrorRelationship to reach the "Established" state in Region A.
oc get amr amr-demo -n source-vm-ns -w
We can see below AMR is in the Established state in Region A
Note: If VM was in running state it would be torn down as part of Establishing the AMR
Region B is now the primary site with the VM operational replicating to Region A.
In this scenario we revert to the original replication direction and state, we first replicate (resynchronize) any application changes to the Region A Virtual Machines prior to reversing the replication direction.
Syncing changes back to original Region A and bringing the App down on Region B
Reversing the replication direction from Region B to Region A
Prerequisite to this section would be Reverse resync a failed over replication relationship as outlined above.
Disable Schedules on Region B - Delete any snapshot schedules in Region B.
oc delete schedule snapshot-schedule -n destination-vm-ns
ShutdownSnapshot CR: Create a ShutdownSnapshot CR on Region B to take a final snapshot and gracefully shutdown your application.
Configure the following attributes for ShutDown snapshot CR:
Region B AppVaultRef: (Required) This value must match the metadata.name field of the AppVault for the source application
Region B ApplicationRef: (Required) This value must match the metadata.name field of the source application CR file.
Example of ShutdownSnapshot CR:
apiVersion: protect.trident.netapp.io/v1
kind: ShutdownSnapshot
metadata:
name: replication-shutdown-snapshot
namespace: destination-vm-ns
spec:
appVaultRef: destination-bucket
applicationRef: destination-vm-ns
Create ShutdownSnapshot CR using command below:
oc apply –f example-file.yaml
On the source cluster, after the shutdown snapshot completes, get the status of the shutdown snapshot:
oc get shutdownsnapshot -n source-vm-ns <shutdown_snapshot_name> -o yaml
Shutdown snapshot created and VM resources have cleaned up in Region B
After the ShutdownSnapshot has completed, get the name of the snapshot from the CR status as mentioned in our documention.
On the source cluster, find the value of shutdownsnapshot.status.appArchivePath using the command below, and record the last part of the file path (also called the basename; this will be everything after the last slash):
oc get shutdownsnapshot -n source-vm-ns <shutdown_snapshot_name> -o jsonpath='{.status.appArchivePath}'
Perform a Failover using the snapshot basename in apparchive path retrieved using previous step from Region B to Region A.
Follow Reverse Resync steps from Region A to Region B.
Enable snapshot schedules on your original site Region A as described in prerequisites section of the blog.
At this point you have completed the full cycle of failover and failback. VMs are again running in Region A replicating to Region B.
Conclusion
In conclusion, the integration of OpenShift Virtualization, NetApp Trident, and AppMirror provides a powerful solution for protecting and mirroring critical applications. By leveraging this integrated solution, organizations can unlock enterprise-grade virtualization, simplify management, and improve data protection. Whether you're a seasoned IT professional or just starting to explore the world of virtualization, this integrated solution is definitely worth considering.