Tech ONTAP Blogs

Enhancing Trident protect backups with bucket replication for disaster recovery across regions

PatricU
NetApp
452 Views

In a recent series of blog posts, we introduced the NetApp® Trident™ protect advanced application data management (ADM) and protection capabilities for stateful Kubernetes applications, along with the new Trident protect CLI. We discussed how its Kubernetes-native custom resources (CRs) can facilitate the integration of application protection into your automation and deployment workflows by using either manifests or the Trident protect CLI. In addition, we explored how Trident protect can seamlessly integrate into your GitOps workflow.

 

Most recently, another Community blog post delved into how NetApp Trident protect , built on top of the NetApp ONTAP® SnapMirror® feature, can empower your business to achieve seamless application mobility and disaster recovery (DR) for mission-critical applications. As a result, your organization can attain very low recovery point objectives (RPOs) and recovery time objectives (RTOs).

 

However, not all workloads require an RTO and RPO of minutes or less. For those workloads, backup and restore can still be the most appropriate strategy, because it is the simplest and least expensive to implement. By replicating your backup data to another data center or region, you can effectively manage large-scale disasters. If a disaster prevents your workload from operating in a region, the workload can be restored to a recovery region or data center and can continue operations from there.

 

When using public cloud Kubernetes services, you can even use an account or a subscription that differs from your primary region, with distinct credentials. This approach can prevent human error or malicious actions in one region from affecting another, enabling you to recover your services even if, for example, a ransomware attack compromises your primary account. Also, replicating your backup data from an on-premises object storage bucket to a cloud-based object storage bucket and then restoring your services in the cloud during a disaster are another effective strategy to protect your business-critical applications.

 

You might decide, for example, to store only replicated backup data in the cloud while keeping your production environment in your own data center. With this hybrid approach, you still gain the advantages of scalability and geographic distance without having to move your production environment. In a cloud-to-cloud model, both production and DR are in the cloud, although at different sites and in different subscriptions to maintain enough physical and logical separation.

 

NetApp Trident protect provides advanced application data management capabilities that enhance the functionality and availability of stateful Kubernetes applications supported by NetApp ONTAP storage systems and the NetApp Trident Container Storage Interface (CSI) storage provisioner. It is compatible with a wide range of fully managed and self-managed Kubernetes offerings (see the supported Kubernetes distributions and storage back ends), making it an optimal solution for protecting your Kubernetes services across various platforms and regions.

 

In this blog post, I show you how to combine Trident protect backup and restore with bucket replication between two different regions, with clusters and buckets hosted on Amazon Web Services (AWS). However, other supported clusters and object storage solutions work the same way. For example, the NetApp StorageGRID® object-based storage solution provides the CloudMirror replication service for its S3 buckets.

Prerequisites

This blog post walks through a basic backup and restore workflow of a sample application with persistent data running on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster backed by Amazon FSx for NetApp ONTAP storage. The Trident protect backups are stored in an Amazon S3 bucket in the eu-west-1 region and are replicated to an Amazon S3 bucket in eu-central-1. From there, we restore the sample application into another Amazon EKS cluster in the eu-central-1 region.

 

If you plan to follow the process in this blog post step by step, you must have the following available:

  • Ideally, two Kubernetes clusters with the latest versions of Trident and Trident protect installed, and their associated kubeconfig files
  • A NetApp ONTAP storage back end and Trident with configured storage back ends, storage classes, and volume snapshot classes (per cluster)
  • Two configured object storage buckets for storing backups and metadata information, with bucket replication configured
  • A workstation with kubectl configured to use kubeconfig 
  • The tridentctl-protect CLI of Trident protect installed on your workstation
  • Admin user permission on the Kubernetes clusters
  • The AWS CLI installed on your workstation

Set up buckets and bucket replication

To set up the Amazon S3 buckets and the replication between the buckets for our sample environment, we follow this example AWS walkthrough. We create two Amazon S3 buckets, pu-repl-source in the eu-west-1 region and pu-repl-dest in the eu-central-1 region. In the AWS console, we confirm the created replication configuration, as shown in Figure 1.

Figure 1) Replication configuration between buckets pu-repl-source and pu-repl-dest.Figure 1) Replication configuration between buckets pu-repl-source and pu-repl-dest.

Let’s quickly test the configured bucket replication by using the AWS CLI. We list the two buckets and confirm that they are empty.

$ aws s3 ls | grep pu
2025-03-27 15:24:27 pu-repl-dest
2025-03-27 17:07:10 pu-repl-source

$ aws s3 ls s3://pu-repl-source

$ aws s3 ls s3://pu-repl-dest

Now we upload a random image file to the source bucket pu-repl-source and confirm that it’s replicated to the destination bucket pu-repl-dest.

$ aws s3 cp ~/Downloads/Image.jpeg s3://pu-repl-source
upload: Downloads/Image.jpeg to s3://pu-repl-source/Image.jpeg
 
$ aws s3 ls s3://pu-repl-source
2025-03-28 11:24:49     379931 Image.jpeg

$ aws s3 ls s3://pu-repl-dest
2025-03-28 11:24:49     379931 Image.jpeg

Deleting the object from the source bucket also deletes it from the destination bucket.

$ aws s3 rm s3://pu-repl-source/Image.jpeg
delete: s3://pu-repl-source/Image.jpeg

$ aws s3 ls s3://pu-repl-source
 
$ aws s3 ls s3://pu-repl-dest

Configure clusters and a sample application

For the tests, we use two Amazon EKS clusters in the same AWS regions where the Amazon S3 buckets are. Both clusters have persistent storage that’s backed by Amazon FSx for NetApp ONTAP storage and that’s provisioned through NetApp Trident. Both also have the following storage classes available, backed by the respective Trident back ends.

$ kubectl get sc
NAME                        PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
fsx-netapp-block            csi.trident.netapp.io   Delete          Immediate              true                   12m
fsx-netapp-file (default)   csi.trident.netapp.io   Delete          Immediate              true                   12m
gp2                         kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  32m

$ tridentctl get backends
+-----------------------+----------------+--------------------------------------+--------+------------+---------+
|         NAME          | STORAGE DRIVER |                 UUID                 | STATE  | USER-STATE | VOLUMES |
+-----------------------+----------------+--------------------------------------+--------+------------+---------+
| backend-fsx-ontap-nas | ontap-nas      | fa49d82c-8600-4114-be54-49947ebbe80a | online | normal     |       0 |
| backend-fsx-ontap-san | ontap-san      | a0e43009-b193-40c8-af66-24ad21c65dfb | online | normal     |       0 |
+-----------------------+----------------+--------------------------------------+--------+------------+---------+

Sample application

For our testing purposes, we deploy a simple Alpine container with a persistent volume that’s backed by Amazon FSx for NetApp ONTAP storage on the Amazon EKS cluster eks-source-cluster in the namespace alpine.

$ kubectl apply -f - <<EOF
> apiVersion: v1
> kind: Namespace
> metadata:
>   name: alpine
>   labels:
>     app: alpine
> ---
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   labels:
>     app: alpine
>   name: alpine
>   namespace: alpine
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: alpine
>   strategy: {}
>   template:
>     metadata:
>       creationTimestamp: null
>       labels:
>         app: alpine
>     spec:
>       containers:
>       - image: alpine:latest
>         name: alpine-container
>         command: ["/bin/sh", "-c", "sleep infinity"]  # Keep the container running
>         volumeMounts:
>         - mountPath: /data
>           name: data
>       volumes:
>       - name: data
>         persistentVolumeClaim:
>           claimName: alpinedata
> ---
> apiVersion: v1
> kind: PersistentVolumeClaim
> metadata:
>   name: alpinedata
>   namespace: alpine
> spec:
>   accessModes:
>   - ReadWriteMany
>   resources:
>     requests:
>       storage: 2Gi
>   storageClassName: fsx-netapp-file
> EOF
namespace/alpine created
deployment.apps/alpine created
persistentvolumeclaim/alpinedata created
 
$ kubectl get all,pvc -n alpine
NAME                          READY   STATUS    RESTARTS   AGE
pod/alpine-5bdb97fb48-f9wgr   1/1     Running   0          78s
 
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/alpine   1/1     1            1           79s
 
NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/alpine-5bdb97fb48   1         1         1       79s
 
NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/alpinedata   Bound    pvc-d5c38879-83f5-49b0-9ad9-667792dd3154   2Gi        RWX            fsx-netapp-file   <unset>                 79s

Let’s also add some random data files to the persistent volume.

$ kubectl -n alpine exec -it pod/alpine-5bdb97fb48-f9wgr -- df -h /data
Filesystem                Size      Used Available Use% Mounted on
198.19.255.230:/trident_pvc_d5c38879_83f5_49b0_9ad9_667792dd3154
                          2.0G    768.0K      2.0G   0% /data

$ for i in 1 2 3 4 5; do kubectl -n alpine exec -it pod/alpine-5bdb97fb48-f9wgr -- dd if=/dev/urandom of=/data/file${i} bs=1024k count=100; done
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 1.169176 seconds, 85.5MB/s
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 1.169829 seconds, 85.5MB/s
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 1.165403 seconds, 85.8MB/s
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 1.167324 seconds, 85.7MB/s
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 1.175639 seconds, 85.1MB/s

$ kubectl -n alpine exec -it pod/alpine-5bdb97fb48-f9wgr -- df -h /data
Filesystem                Size      Used Available Use% Mounted on
198.19.255.230:/trident_pvc_d5c38879_83f5_49b0_9ad9_667792dd3154
                          2.0G    509.2M      1.5G  25% /data

Manage and protect the sample application

Before we can protect the sample application with Trident protect on the primary cluster, we need to create the AppVault CR that provides Trident protect with the access details for the Amazon S3 bucket storing the backup data. First, we store the Amazon S3 access credentials for the source bucket pu-repl-source in the secret pu-repl-source-secret in the trident-protect namespace.

$ kubectl create secret generic pu-repl-source-secret --from-literal=accessKeyID=<REDACTED> --from-literal=secretAccessKey=<REDACTED> -n trident-protect
secret/pu-repl-source-secret created

Now we can create the alpine Trident protect application by configuring the complete alpine namespace as an application in Trident protect.

$ tridentctl-protect create application alpine --namespaces alpine -n alpine
Application "alpine" created.
 
$ tridentctl-protect get application -A
+-----------+--------+------------+-------+-----+
| NAMESPACE |  NAME  | NAMESPACES | STATE | AGE |
+-----------+--------+------------+-------+-----+
| alpine    | alpine | alpine     | Ready | 14s |
+-----------+--------+------------+-------+-----+

To regularly protect the application, we also create a protection schedule, making hourly backups to the AppVault pu-repl-source and retaining the last three backups and snapshots, again using the Trident protect CLI.

$ tridentctl-protect create schedule --app alpine --appvault pu-repl-source --snapshot-retention 3 --backup-retention 3 --granularity Hourly --minute 10 -n alpine
Schedule "alpine-x7ye33" created.
 
$ tridentctl-protect get schedules -A
+-----------+---------------+--------+---------------+---------+-------+-------+-----+
| NAMESPACE |     NAME      |  APP   |   SCHEDULE    | ENABLED | STATE | ERROR | AGE |
+-----------+---------------+--------+---------------+---------+-------+-------+-----+
| alpine    | alpine-x7ye33 | alpine | Hourly:min=10 | true    |       |       | 4s  |
+-----------+---------------+--------+---------------+---------+-------+-------+-----+

After the first backup is complete, we check the content of the AppVault pu-repl-source.

$ tridentctl-protect get backup -n alpine
+-----------------------------+--------+----------------+-----------+-------+-------+
|            NAME             |  APP   | RECLAIM POLICY |   STATE   | ERROR |  AGE  |
+-----------------------------+--------+----------------+-----------+-------+-------+
| hourly-54241-20250401141000 | alpine | Retain         | Completed |       | 3m52s |
+-----------------------------+--------+----------------+-----------+-------+-------+
 
$ tridentctl-protect get appvaultcontent pu-repl-source
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+
|      CLUSTER       |  APP   |  TYPE  |            NAME             | NAMESPACE |         TIMESTAMP         |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+
| eks-source-cluster | alpine | backup | hourly-54241-20250401141000 | alpine    | 2025-04-01 14:11:50 (UTC) |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+

Configure the destination cluster

On our destination Amazon EKS cluster eks-dest-cluster in the eu-central-1 region, we configure Trident protect to access the destination Amazon S3 bucket pu-repl-dest with the replicated backup content through the AppVault CR pu-repl-dest. After creating the secret pu-repl-dest-secret with the access credentials for the destination Amazon S3 bucket, we create the AppVault CR pu-repl-dest with the Trident protect CLI, allowing Trident protect to access the bucket.

$ kubectl create secret generic pu-repl-dest-secret --from-literal=accessKeyID=<REDACTED> --from-literal=secretAccessKey=<REDACTED> -n trident-protect
secret/pu-repl-dest-secret created

$ tridentctl-protect create appvault AWS pu-repl-dest --bucket pu-repl-dest --secret pu-repl-dest-secret --endpoint s3.eu-central-1.amazonaws.com -n trident-protect
AppVault "pu-repl-dest" created.
 
$ tridentctl-protect get appvault --show-full-error
+--------------+----------+-----------+-------+---------+-------+
|     NAME     | PROVIDER |   STATE   | ERROR | MESSAGE |  AGE  |
+--------------+----------+-----------+-------+---------+-------+
| pu-repl-dest | AWS      | Available |       |         | 8m    |
+--------------+----------+-----------+-------+---------+-------+

Now we can check the content of the replicated bucket by using the get appvaultcontent command of the Trident protect CLI.

$ tridentctl-protect get appvaultcontent pu-repl-dest
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+
|      CLUSTER       |  APP   |  TYPE  |            NAME             | NAMESPACE |         TIMESTAMP         |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+
| eks-source-cluster | alpine | backup | hourly-54241-20250401141000 | alpine    | 2025-04-01 14:11:50 (UTC) |
| eks-source-cluster | alpine | backup | hourly-54241-20250401151000 | alpine    | 2025-04-01 15:11:38 (UTC) |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+

Because the protection schedule on the primary cluster, eks-source-cluster, meanwhile executed a second hourly backup of the sample application, we see these two backups now on the replicated bucket and can start a restore test.

Perform a restore test on the DR site

We test a restore on the DR cluster eks-dest-cluster with the most recent backup, hourly-54241-20250401151000, that’s available in the replicated Amazon S3 bucket pu-repl-dest. To use the create backuprestore command on the DR cluster, we first need to determine the path of the backup archive in the AppVault with the --show-paths option of the get appvaultcontent command.

$ tridentctl-protect get appvaultcontent pu-repl-dest --show-paths
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+----------------------------------------------------------------------------------------------------------------------+
|      CLUSTER       |  APP   |  TYPE  |            NAME             | NAMESPACE |         TIMESTAMP         |                                                         PATH                                                         |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+----------------------------------------------------------------------------------------------------------------------+
| eks-source-cluster | alpine | backup | hourly-54241-20250401141000 | alpine    | 2025-04-01 14:11:50 (UTC) | alpine_ea2ea171-1c23-40bb-8625-d33a7e7c2edd/backups/hourly-54241-20250401141000_3a9b5d2e-5d0c-4bc9-86cc-f53f84cf7faa |
| eks-source-cluster | alpine | backup | hourly-54241-20250401151000 | alpine    | 2025-04-01 15:11:38 (UTC) | alpine_ea2ea171-1c23-40bb-8625-d33a7e7c2edd/backups/hourly-54241-20250401151000_f41b3bf0-2aff-404b-90c3-c9c1a960fcce |
+--------------------+--------+--------+-----------------------------+-----------+---------------------------+----------------------------------------------------------------------------------------------------------------------+

With the path value alpine_ea2ea171-1c23-40bb-8625-d33a7e7c2edd/backups/hourly-54241-20250401151000_f41b3bf0-2aff-404b-90c3-c9c1a960fcce of the most recent backup, we can now start the restore from the replicated backup on the destination cluster.

$ tridentctl-protect create backuprestore --appvault pu-repl-dest --path alpine_ea2ea171-1c23-40bb-8625-d33a7e7c2edd/backups/hourly-54241-20250401151000_f41b3bf0-2aff-404b-90c3-c9c1a960fcce --namespace-mapping alpine:alpine -n alpine
BackupRestore "alpine-heqp4w" created.

We follow the progress of the restore, which finishes quickly.

$ kubectl -n alpine get backuprestore alpine-heqp4w -w
NAME            STATE     ERROR   AGE
alpine-heqp4w   Running           17s
alpine-heqp4w   Running           23s
alpine-heqp4w   Running           23s
alpine-heqp4w   Running           36s
alpine-heqp4w   Running           36s
alpine-heqp4w   Running           36s
alpine-heqp4w   Running           36s
alpine-heqp4w   Running           36s
alpine-heqp4w   Running           40s
alpine-heqp4w   Running           40s
alpine-heqp4w   Running           40s
alpine-heqp4w   Running           40s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Running           47s
alpine-heqp4w   Completed           47s

The sample application comes up successfully after the restore, and the sample data files are also available, so the replicated backup was valid.

$ kubectl get all,pvc -n alpine
NAME                          READY   STATUS    RESTARTS   AGE
pod/alpine-5bdb97fb48-2lnb5   1/1     Running   0          52s
 
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/alpine   1/1     1            1           52s
 
NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/alpine-5bdb97fb48   1         1         1       52s
 
NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/alpinedata   Bound    pvc-b5d6ced6-5734-40ae-8c9e-b473b032e59e   2Gi        RWX            fsx-netapp-file   <unset>                 55s
 
$ kubectl -n alpine exec -it pod/alpine-5bdb97fb48-2lnb5 -- df -h /data
Filesystem                Size      Used Available Use% Mounted on
198.19.255.178:/trident_pvc_b5d6ced6_5734_40ae_8c9e_b473b032e59e
                          2.0G    508.7M      1.5G  25% /data
 
$ kubectl -n alpine-restore exec -it pod/alpine-5bdb97fb48-2lnb5 -- ls -l /data
total 474848
-rw-r--r--    1 root     root     104857600 Apr  1 13:51 file1
-rw-------    1 root     root     104857600 Apr  2 11:03 file2
-rw-r--r--    1 root     root     104857600 Apr  1 13:51 file3
-rw-r--r--    1 root     root     104857600 Apr  1 13:51 file4
-rw-r--r--    1 root     root     104857600 Apr  1 13:51 file5

Conclusion and call to action

In conclusion, the integration of NetApp Trident protect with bucket replication for DR across regions offers a robust and cost-effective solution for maintaining the availability and protection of your business-critical stateful Kubernetes applications. By using object storage bucket replication, you can achieve geographic redundancy and safeguard your critical data against regional failures and disasters. The step-by-step guide provided in this blog post demonstrates how to configure and to use Trident protect for backup and restore operations so that your applications can be quickly and reliably restored in a DR scenario.

 

The seamless integration of Trident protect with Kubernetes-native CRs and the Trident protect CLI simplifies the automation of backup and restore processes, making it easier to integrate these critical operations into your existing workflows. This approach not only enhances data protection but also provides flexibility in managing your backups across different environments, whether on premises, hybrid, or cloud based.

 

By implementing the strategies outlined in this blog post, your organization can effectively mitigate risks, maintain business continuity, and keep your applications resilient and recoverable in the face of unforeseen events. If you’re seeking to enhance your DR capabilities, NetApp Trident protect offers a comprehensive and scalable solution that you can tailor to meet the specific needs of your Kubernetes environments.

 

If you want to see for yourself how easy it is to protect persistent Kubernetes applications with Trident protect, get started today!

Public