Tech ONTAP Blogs

Automating registry failover for disaster recovery with Trident protect post-restore hooks

PatricU
NetApp
1,294 Views

Introduction

Disaster recovery for business-critical Kubernetes applications often requires using replicated private registries to pull the container images locally on the DR site in case of a complete failure (including the registry) of the primary site. This can be the case for both on-premises and cloud-based Kubernetes deployments. Therefore, it’s essential for the backup system used to protect these critical Kubernetes applications to have the ability to modify Kubernetes configurations after a restore. That’s also important for other aspects that might need to be changed on the DR site, like ingress configuration.

 

NetApp® Trident™ protect provides application-aware data protection, mobility, and disaster recovery for any workload running on any Kubernetes distribution, leveraging NetApp’s proven and expansive storage portfolio in the public cloud and on premises. Trident protect enables administrators to easily protect, back up, migrate, and create working clones of Kubernetes applications, through either its CLI or Kubernetes-native custom resource definitions (CRDs).

 

Trident protect offers various types of execution hooks—custom scripts that you can configure to run in conjunction with a data protection operation of a managed app. With a post-restore hook, you can for example change the container image URL after an application restore to a DR site. Read on to find out how.

Setup

In this blog, we use the post-restore URL rewrite hook example with Amazon Elastic Container Registry (ECR) cross-region replication (CRR) to demonstrate how to restore an NGINX sample application. The sample application is originally running on an Amazon Elastic Kubernetes Service (EKS) cluster in the eu-west-1 region and shall be failed over to a DR cluster in the eu-north-1 region in case of a disaster. The NGINX container image is to be pulled from private image repositories in the respective regions.

Spoiler
Although we use ECR in this blog post, the overall process will be the same regardless of your private container registry of choice, in a cloud or on premises.

Enable CRR for ECR registry to DR site

After creating a private registry in Amazon Web Services (AWS), we followed the steps in the AWS documentation to configure private image replication from eu-west-1 to the eu-north-1 region:

$ aws ecr describe-registry –-region eu-west-1
{
    "registryId": "467886448844",
    "replicationConfiguration": {
        "rules": [
            {
                "destinations": [
                    {
                        "region": "eu-north-1",
                        "registryId": "467886448844"
                    }
                ]
            }
        ]
    }
}

Now all content pushed to repositories in eu-west-1 is automatically replicated to eu-north-1. Amazon ECR keeps the destination and source synchronized.

Prepare repository and replication

First, we create a private Amazon ECR repository nginx to store the NGINX container image in the eu-west-1 region in the AWS console:

Figure 1) Amazon ECR repository for nginx.Figure 1) Amazon ECR repository for nginx.

We take note of the push command for the repository:

Figure 2) Push commands for the nginx repository.Figure 2) Push commands for the nginx repository.

We have already pulled the nginx image to our local repository:

$ docker pull nginx:latest
latest: Pulling from library/nginx
3ae0c06b4d3a: Pull complete 
efe5035ea617: Pull complete 
a9b1bd25c37b: Pull complete 
f853dda6947e: Pull complete 
38f44e054f7b: Pull complete 
ed88a19ddb46: Pull complete 
495e6abbed48: Pull complete 
Digest: sha256:08bc36ad52474e528cc1ea3426b5e3f4bad8a130318e3140d6cfe29c8892c7ef
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest

So, we can push it to the Amazon ECR repository:

$ aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 467886448844.dkr.ecr.eu-west-1.amazonaws.com
Login Succeeded

Logging in with your password grants your terminal complete access to your account.
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/
$ docker tag nginx:latest 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
$ docker push 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
The push refers to repository [467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx]
9e96226c58e7: Pushed
12a568acc014: Pushed
7757099e19d2: Pushed
bf8b62fb2f13: Pushed
4ca29ffc4a01: Pushed
a83110139647: Pushed
ac4d164fef90: Pushed
latest: digest: sha256:d2b2f2980e9ccc570e5726b56b54580f23a018b7b7314c9eaff7e5e479c78657 size: 1778

Using the AWS CLI, we can find the repositoryUri of the repository in the eu-west-1 region:

$ aws ecr describe-repositories –-region eu-west-1
{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:eu-west-1:467886448844:repository/nginx",
            "registryId": "467886448844",
            "repositoryName": "nginx",
            "repositoryUri": "467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx",
            "createdAt": "2023-06-20T11:37:12+00:00",
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "encryptionConfiguration": {
                "encryptionType": "AES256"
            }
        }
    ]
}

(URI: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx)

Amazon ECR automatically created the nginx repository on the DR site due to the configured replication:

$ aws ecr describe-repositories --region eu-north-1
{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:eu-north-1:467886448844:repository/nginx",
            "registryId": "467886448844",
            "repositoryName": "nginx",
            "repositoryUri": "467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx",
            "createdAt": "2023-06-20T14:09:02+02:00",
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "encryptionConfiguration": {
                "encryptionType": "AES256"
            }
        }
    ]
}

And then automatically replicated the nginx image to the DR site eu-north-1:

$ aws ecr list-images --repository-name nginx --region eu-north-1
{
    "imageIds": [
        {
            "imageDigest": "sha256:d2b2f2980e9ccc570e5726b56b54580f23a018b7b7314c9eaff7e5e479c78657",
            "imageTag": "latest"
        }
    ]
}

Note that the repository URI on the DR site is different from the URI on the primary site: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest. Therefore, in the event of a complete disaster at the primary site that also disables the private registry, we must make sure that the container images are pulled from the DR site’s repository. Otherwise, the applications will not start.

Deploy demo application

Now we can deploy the demo application on the EKS cluster on the primary site using the following manifest, which installs an NGINX deployment and a PV backed by Amazon FSx for NetApp ONTAP into the namespace demo. The NGINX container image is pulled from the Amazon ECR repository in the eu-west-1 region:

$ cat sample-app.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: demo
---
apiVersion: v1
kind: Service
metadata:
  name: demo-service
  namespace: demo
  labels:
    app: demo
spec:
  ports:
    - port: 80
  selector:
    app: demo
    tier: frontend
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
  namespace: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
      tier: frontend
  template:
    metadata:
      labels:
        app: demo
        tier: frontend
    spec:
      containers:
        - image: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
          imagePullPolicy: Always
          name: demo
          ports:
            - containerPort: 80
              name: demo
          volumeMounts:
          - mountPath: /data
            name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: nginxdata
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginxdata
  namespace: demo
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: fsx-netapp-file

$ kubectl apply -f sample-app.yaml
namespace/demo created
service/demo-service created
deployment.apps/demo-deployment created
persistentvolumeclaim/nginxdata created

We check that the deployment was successful:

$ kubectl get all,pvc -n demo
NAME                                   READY   STATUS    RESTARTS   AGE
pod/demo-deployment-7bd7d8f4cf-5dh89   1/1     Running   0          106s
 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/demo-service   LoadBalancer   172.20.56.137   <pending>     80:31099/TCP   107s
 
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo-deployment   1/1     1            1           106s
 
NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-deployment-7bd7d8f4cf   1         1         1       106s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18   2Gi        RWX            fsx-netapp-file   <unset>                 107s

And that the NGINX container image was pulled from the correct repository:

$ kubectl -n demo describe pod/demo-deployment-7bd7d8f4cf-5dh89 | grep Image:
    Image:          467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest

Manage and protect demo application with Trident protect

The EKS cluster eks-demo1-euwest1-cluster on which we deployed the demo application has Trident protect and its CLI tridentctl-protect already installed and an AWS S3 bucket is configured as an AppVault CR pu-demo to store application backup data. Therefore, we can manage the demo application with Trident protect simply by defining its namespace as an application named demo in Trident protect.

$ tridentctl-protect create app demo --namespaces demo -n demo
Application "demo" created.
 
$ tridentctl-protect get application -A
+-----------+------+------------+-------+-----+
| NAMESPACE | NAME | NAMESPACES | STATE | AGE |
+-----------+------+------------+-------+-----+
| demo      | demo | demo       | Ready | 6s  |
+-----------+------+------------+-------+-----+

To regularly protect the demo application, we create a protection schedule with hourly backups to the AWS S3 bucket:

$ tridentctl-protect create schedule --app demo --appvault pu-demo --backup-retention 3 --granularity Hourly --minute 30 --snapshot-retention 3 -n demo
Schedule "demo-q5bi0f" created.
$ tridentctl-protect get schedule -n demo
+-------------+------+---------------+---------+-------+-------+-----+
|    NAME     | APP  |   SCHEDULE    | ENABLED | STATE | ERROR | AGE |
+-------------+------+---------------+---------+-------+-------+-----+
| demo-q5bi0f | demo | Hourly:min=30 | true    |       |       | 8s  |
+-------------+------+---------------+---------+-------+-------+-----+

Configure Trident protect hook components

To change the container image URL from region eu-west-1 to region eu-north-1 after a restore, we use a modified post-restore hook from our collection of example execution hooks in the Verda GitHub project. There, the post-restore URL-rewrite hook can be adapted for our purposes. It consists of two parts: the actual post-restore execution hook script url-rewrite.sh, which swaps all container images between two regions when invoked, and a hook execution container definition rewrite-infra.yaml, which we need to modify to fit into our environment.

 

Because the url-rewrite.sh execution hook needs to run in a container with the K8s CLI installed, the helper tool deploys a generic Alpine container and installs the K8s CLI. It also installs a ServiceAccount and RoleBinding with the necessary permissions in the application namespace.

 

To adapt and deploy the hook components, we clone the Verda GitHub repository and change into the Verda/URL-rewrite directory:

$ git clone https://github.com/NetApp/Verda.git
Cloning into 'Verda'...
remote: Enumerating objects: 317, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 317 (delta 54), reused 30 (delta 30), pack-reused 233 (from 1)
Receiving objects: 100% (317/317), 88.24 KiB | 1.76 MiB/s, done.
Resolving deltas: 100% (151/151), done.
 
$ cd Verda/URL-rewrite

First, we need to adapt the manifest for the helper tools to our sample application. We make sure that the namespace values are set to the namespace demo of the sample app and that the labels fit our application needs:

$ cat rewrite-infra.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tp-hook-deployment
  namespace: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      serviceAccountName: kubectl-ns-admin-sa
      containers:
      - name: alpine-tp-hook
        image: alpine:latest
        env:
          - name: KUBECTL_VERSION
            value: "1.23.9"
        command: ["/bin/sh"]
        args:
        - -c
        - >
          apk add curl jq &&
          curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
          mv kubectl /usr/bin/kubectl &&
          chmod +x /usr/bin/kubectl &&
          trap : TERM INT; sleep infinity & wait

Assuming that we’re not allowed to use a public repository for the Alpine image, we also create a (replicated) private repository for the Alpine image. We push it there from our local repository, following the same steps as above for the NGINX image:

$ docker push 467886448844.dkr.ecr.eu-west-1.amazonaws.com/alpine:latest
The push refers to repository [467886448844.dkr.ecr.eu-west-1.amazonaws.com/alpine]
78a822fe2a2d: Pushed
latest: digest: sha256:25fad2a32ad1f6f510e528448ae1ec69a28ef81916a004d3629874104f8a7f70 size: 528

We find the repositoryUri of the alpine ECR repository on the DR site eu-north-1 is 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:

$ aws ecr describe-repositories --region eu-north-1 --repository-name alpine
{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:eu-north-1:467886448844:repository/alpine",
            "registryId": "467886448844",
            "repositoryName": "alpine",
            "repositoryUri": "467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine",
            "createdAt": "2023-06-26T12:54:40+02:00",
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "encryptionConfiguration": {
                "encryptionType": "AES256"
            }
        }
    ]
}

And we confirm that the Alpine image was replicated successfully to the DR site:

$ cat rewrite-infra-ECR-DR.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tp-hook-deployment
  namespace: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      serviceAccountName: kubectl-ns-admin-sa
      containers:
      - name: alpine-tp-hook
        image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
        env:
          - name: KUBECTL_VERSION
            value: "1.23.9"
        command: ["/bin/sh"]
        args:
        - -c
        - >
          apk add curl jq &&
          curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
          mv kubectl /usr/bin/kubectl &&
          chmod +x /usr/bin/kubectl &&
          trap : TERM INT; sleep infinity & wait

We can now deploy the hook components into the namespace of the sample application and confirm that the helper pod is running:

$ kubectl apply -f rewrite-infra-ECR-DR.yaml
serviceaccount/kubectl-ns-admin-sa created
rolebinding.rbac.authorization.k8s.io/kubectl-ns-admin-sa created
deployment.apps/tp-hook-deployment created
 
$ kubectl get all,pvc -n demo
NAME                                     READY   STATUS    RESTARTS   AGE
pod/demo-deployment-7bd7d8f4cf-5dh89     1/1     Running   0          43m
pod/tp-hook-deployment-6cbdc996f-pprdc   1/1     Running   0          11s
 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/demo-service   LoadBalancer   172.20.56.137   <pending>     80:31099/TCP   43m
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo-deployment      1/1     1            1           43m
deployment.apps/tp-hook-deployment   1/1     1            1           11s
 
NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-deployment-7bd7d8f4cf     1         1         1       43m
replicaset.apps/tp-hook-deployment-6cbdc996f   1         1         1       11s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18   2Gi        RWX            fsx-netapp-file   <unset>                 43m
 
NAME                                                                                                                            READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS           SNAPSHOTCONTENT                                    CREATIONTIME   AGE
volumesnapshot.snapshot.storage.k8s.io/snapshot-01f9facb-580f-4b25-9d49-e149c55fd540-pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18   true         nginxdata                           312Ki         csi-trident-snapclass   snapcontent-1b0794ca-9ad4-427a-9f5d-c7b196dfef33   21m            21m

After confirming that the Alpine image was pulled from the private ECR repository on the DR site, we’re all set to add the post-restore URL-rewrite hook to the sample application in ACS:

$ kubectl -n demo describe pod/tp-hook-deployment-6cbdc996f-pprdc | grep Image:
    Image:         467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest

Add post-restore execution hook

In the next step, we create the post-restore hook post-restore-url-rewrite with these details:

  1. Action:
    • Restore
  2. Stage:
    • Post
  3. App:
    • demo
  4. Source file
    • Path to the hook script
  5. Hook arguments (mandatory for this specific hook):
    • region A (467886448844.dkr.ecr.eu-west-1.amazonaws.com) and region B (467886448844.dkr.ecr.eu-north-1.amazonaws.com). Order does not matter.
  6. Container matches (defines the container in which the hook script will be executed):
    • Hook filter type: We select containerName from the from the available options of containerImage, containerName, podName, podLabel, and namespaceName.

With these details, the CLI command to create the execution hook in the demo namespace is:

$ tridentctl-protect create exechook post-restore-url-rewrite --action Restore --stage Post --app demo --source-file ./url-rewrite.sh --arg 467886448844.dkr.ecr.eu-west-1.amazonaws.com --arg 467886448844.dkr.ecr.eu-north-1.amazonaws.com --match containerName:alpine-tp-hook -n demo
ExecHook "post-restore-url-rewrite" created.

We verify that the hook was create successfully and is in the ENABLED state:

$ tridentctl-protect get exechook -n demo
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+
|           NAME           | APP  |            MATCH             | ACTION  | STAGE | ENABLED | ERROR | AGE |
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+
| post-restore-url-rewrite | demo | containerName:alpine-tp-hook | Restore | Post  | true    |       | 47s |
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+

Test application restore

Now we can test application restores to the DR site eu-north-1 and locally to the same cluster.

Restore to DR site

To test a restore to the DR site, we use a second cluster eks-demo2-eunorth1-cluster in the DR location eu-north-1 with Trident protect installed on it. The DR cluster has an AppVault CR pu-demo configured that points to the same AWS S3 bucket pu-demo hosting the backups taken from the primary cluster:

$ tridentctl-protect get appvault
+---------+----------+-----------+-------+---------+-----+
|  NAME   | PROVIDER |   STATE   | ERROR | MESSAGE | AGE |
+---------+----------+-----------+-------+---------+-----+
| pu-demo | AWS      | Available |       |         | 6s  |
+---------+----------+-----------+-------+---------+-----+ 

We can list the available application backups in the S3 bucket from the DR cluster:

$ tridentctl-protect get appvaultcontent pu-demo --show-paths --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+
|    CLUSTER    | APP  |  TYPE  |            NAME             | NAMESPACE |         TIMESTAMP         |                                                        PATH                                                        |
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424053000 | demo      | 2025-04-24 05:31:38 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424053000_5ec3eebc-62cf-4fb4-b6aa-96c37985692d |
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424063000 | demo      | 2025-04-24 06:31:42 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424063000_e37cbdcc-680a-44a6-9a7f-9c5d4eb38a18 |
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424073000 | demo      | 2025-04-24 07:31:37 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424073000_3e294f60-67f3-4a9e-87bd-2ea64a91be63 |
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+

We choose to restore from the most recent backup hourly-bdd12-20250424073000 and start the restore on the DR cluster (into the original application namespace demo) by specifying the backup path in the archive on the CLI command and check the progress of the restore:

$ tridentctl-protect create backuprestore --appvault pu-demo --path demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424073000_3e294f60-67f3-4a9e-87bd-2ea64a91be63 --namespace-mapping demo:demo --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster -n demo
BackupRestore "demo-a4xvta" created.
 
$ kubectl -n demo get BackupRestore demo-a4xvta -w
NAME          STATE     ERROR   AGE
demo-a4xvta   Running           14s
demo-a4xvta   Running           26s
demo-a4xvta   Running           26s
demo-a4xvta   Running           37s
demo-a4xvta   Running           37s
demo-a4xvta   Running           37s
demo-a4xvta   Running           37s
demo-a4xvta   Running           37s
demo-a4xvta   Running           46s
demo-a4xvta   Running           46s
demo-a4xvta   Running           46s
demo-a4xvta   Running           46s
demo-a4xvta   Running           61s
demo-a4xvta   Running           61s
demo-a4xvta   Running           70s
demo-a4xvta   Running           70s
demo-a4xvta   Running           71s
demo-a4xvta   Running           71s
demo-a4xvta   Running           71s
demo-a4xvta   Completed           71s

The demo app comes up on the DR cluster eks-demo2-eunorth1-cluster:

$ kubectl get all,pvc -n demo --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
NAME                                      READY   STATUS    RESTARTS   AGE
pod/demo-deployment-5df8dcc7df-79t6m      1/1     Running   0          4h7m
pod/tp-hook-deployment-66bf969f5c-pln4q   1/1     Running   0          4h7m
 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/demo-service   LoadBalancer   172.20.94.103   <pending>     80:31013/TCP   4h7m
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo-deployment      1/1     1            1           4h7m
deployment.apps/tp-hook-deployment   1/1     1            1           4h7m
 
NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-deployment-5df8dcc7df      1         1         1       4h7m
replicaset.apps/demo-deployment-7bd7d8f4cf      0         0         0       4h7m
replicaset.apps/tp-hook-deployment-66bf969f5c   1         1         1       4h7m
replicaset.apps/tp-hook-deployment-6cbdc996f    0         0         0       4h7m
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-37a929d9-34d8-443d-a230-8eee73e02fe3   2Gi        RWX            fsx-netapp-file   <unset>                 4h7m

The detailed steps of the execution hook run can be seen in the ExechooksRun CR that’s created during the restore job run:

$ kubectl -n demo get exechooksrun --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
NAME                       STATE       STAGE   ACTION    ERROR   APP    AGE
post-restore-demo-a4xvta   Completed   Post    Restore           demo   4h11m
 
$ kubectl -n demo get exechooksrun post-restore-demo-a4xvta --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster -o yaml | yq '.status'
completionTimeout: 30m0s
completionTimestamp: "2025-04-24T08:29:08Z"
conditions:
  - lastTransitionTime: "2025-04-24T08:29:01Z"
    message: Found 1 matching container/exechook pairs
    reason: Done
    status: "True"
    type: RetrievedMatchingContainers
  - lastTransitionTime: "2025-04-24T08:29:01Z"
    message: Successfully reconciled
    reason: Done
    status: "True"
    type: WaitForReadiness
  - lastTransitionTime: "2025-04-24T08:29:08Z"
    message: Successfully reconciled
    reason: Done
    status: "True"
    type: ProcessMatchingContainers
  - lastTransitionTime: "2025-04-24T08:29:08Z"
    message: Successfully reconciled
    reason: Done
    status: "True"
    type: Completed
matchingContainers:
  - completionTimestamp: "2025-04-24T08:29:08Z"
    containerImage: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
    containerName: alpine-tp-hook
    execHookRef: post-restore-url-rewrite
    execHookUID: afd07266-bd1a-4d7c-bb0d-1f436b5c36c3
    jobName: ehr-fc46237de1c8444a199d4800cda4b165
    namespace: demo
    podName: tp-hook-deployment-6cbdc996f-4q5mq
    podUID: fb14ed00-17a5-46b0-9662-0a966c514c37
    startTimestamp: "2025-04-24T08:29:01Z"
state: Completed

Checking the container image of the sample application, we see that it was indeed pulled from the nginx ECR repository on the DR site eu-north-1:

$ kubectl -n demo describe pod/demo-deployment-5df8dcc7df-79t6m --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster | grep Image:
    Image:          467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest

Local restore

When doing a local restore, either to the same cluster eks-demo1-euwest1-cluster or to another cluster in the primary location, the post-restore URL-rewrite hook would also be executed. If pulling the container images from the DR site registry for local restores is not wanted or possible, you can either add additional logic to the hook preventing the change of image location during a local restore or disable the post-restore URL-rewrite hook and execute it manually only when needed after a restore. We show the procedure to disable the post-restore hook and execute it manually below.

 

To disable the post-restore-url-rewrite hook on the primary cluster after its creation, we patch it and set the /spec/enabled value to false:

$ kubectl -n demo get exechooks
NAME                       STAGE   ACTION    APP    ENABLED   ERROR   AGE
post-restore-url-rewrite   Post    Restore   demo   true              17h
 
$ kubectl patch execHook post-restore-url-rewrite -n demo --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": false}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
 
$ kubectl -n demo get exechooks
NAME                       STAGE   ACTION    APP    ENABLED   ERROR   AGE
post-restore-url-rewrite   Post    Restore   demo   false             17h

For any subsequent backups, the disabled state of the hook will be preserved in the backups and the hook won't be executed upon restore. Let’s test it and run an on-demand backup named hook-disabled:

$ tridentctl-protect create backup hook-disabled --app demo --appvault pu-demo -n demo
Backup "hook-disabled" created.

Now we restore from this backup into a new namespace demo2 after finding the backup path in the AppVault:

$ tridentctl-protect get appvaultcontent pu-demo --show-paths
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
|    CLUSTER    | APP  |  TYPE  |     NAME      | NAMESPACE |         TIMESTAMP         |                                                 PATH                                                 |
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
| demo1-euwest1 | demo | backup | hook-disabled | demo      | 2025-04-30 15:08:31 (UTC) | demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/backups/hook-disabled_916c0058-4436-483e-bf94-c3ad541449ac |
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
 
$ tridentctl-protect create backuprestore --appvault pu-demo --path demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/backups/hook-disabled_916c0058-4436-483e-bf94-c3ad541449ac --namespace-mapping demo:demo2 -n demo2
BackupRestore "demo-jusg1t" created.

The restore completes after some minutes and once the application is up and running, we can confirm that the image location was not changed, and the container images were pulled from the ECR repository on the primary site:

$ kubectl get all,pvc -n demo2
NAME                                     READY   STATUS    RESTARTS   AGE
pod/demo-deployment-7bd7d8f4cf-qxqp9     1/1     Running   0          49s
pod/tp-hook-deployment-6cbdc996f-fp6z7   1/1     Running   0          49s
 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/demo-service   LoadBalancer   172.20.31.179   <pending>     80:31132/TCP   49s
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo-deployment      1/1     1            1           49s
deployment.apps/tp-hook-deployment   1/1     1            1           49s
 
NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-deployment-7bd7d8f4cf     1         1         1       49s
replicaset.apps/tp-hook-deployment-6cbdc996f   1         1         1       49s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-e81cae8b-9acc-4cf8-bb0b-bbd6d649d7df   2Gi        RWX            fsx-netapp-file   <unset>                 51s
 
$ kubectl -n demo2 describe pod/demo-deployment-7bd7d8f4cf-qxqp9 | grep Image:
    Image:          467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest

A “Found 0 matching container/exechook pairs” message in the status of the execution hook run confirms that the hook was not executed:

$ kubectl -n demo2 get exechooksrun
NAME                       STATE       STAGE   ACTION    ERROR   APP    AGE
post-restore-demo-jusg1t   Completed   Post    Restore           demo   2m44s
 
$ kubectl -n demo2 describe exechooksrun post-restore-demo-jusg1t
Name:         post-restore-demo-jusg1t
Namespace:    demo2
Labels:       <none>
Annotations:  protect.trident.netapp.io/correlationid: dc8ab4d2-11ad-4d26-a69c-785637ae6501
API Version:  protect.trident.netapp.io/v1
Kind:         ExecHooksRun
Metadata:
  Creation Timestamp:  2025-04-30T15:14:21Z
  Finalizers:
    protect.trident.netapp.io/delete-jobs
  Generation:  1
  Owner References:
    API Version:           protect.trident.netapp.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  BackupRestore
    Name:                  demo-jusg1t
    UID:                   480b9773-2f53-46e8-9a0c-ee63e78cbf41
  Resource Version:        348037
  UID:                     be1ace14-b6f4-4cc3-8864-0d43da59b85d
Spec:
  Action:              Restore
  App Archive Path:    demo_cbfb8846-fb7a-4b4c-b583-183ece3a89d5/resourcebackups/protect-resource-backup-480b9773-2f53-46e8-9a0c-ee63e78cbf41_424312dc-562e-48c9-b78c-abad75cb0e52
  App Vault Ref:       pu-demo
  Application Ref:     demo
  Completion Timeout:  0s
  Resource Filter:
  Stage:  Post
Status:
  Completion Timeout:    0s
  Completion Timestamp:  2025-04-30T15:14:21Z
  Conditions:
    Last Transition Time:  2025-04-30T15:14:21Z
    Message:               Found 0 matching container/exechook pairs
    Reason:                Done
    Status:                True
    Type:                  RetrievedMatchingContainers
    Last Transition Time:  2025-04-30T15:14:21Z
    Message:               No matching containers to wait for
    Reason:                Done
    Status:                True
    Type:                  WaitForReadiness
    Last Transition Time:  2025-04-30T15:14:21Z
    Message:               Successfully reconciled
    Reason:                Done
    Status:                True
    Type:                  ProcessMatchingContainers
    Last Transition Time:  2025-04-30T15:14:21Z
    Message:               Successfully reconciled
    Reason:                Done
    Status:                True
    Type:                  Completed
  Matching Containers:
  State:  Completed
Events:   <none>

To execute the post-restore-url-rewrite hook manually after the restore of the application into the demo2 namespace, we need to re-enable the hook in the demo2 namespace, take a backup of only the application resource (without the persistent volumes) and the create and ExechooksRun CR to run the hook.

Let’s enable the hook by patching it again:

$ kubectl -n demo2 get exechooks
NAME                       STAGE   ACTION    APP    ENABLED   ERROR   AGE
post-restore-url-rewrite   Post    Restore   demo   false             23h

$ kubectl patch execHook post-restore-url-rewrite -n demo2 --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": true}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
 
$ kubectl -n demo2 get exechooks
NAME                       STAGE   ACTION    APP    ENABLED   ERROR   AGE
post-restore-url-rewrite   Post    Restore   demo   true              23h

Now we create a ResourceBackup with the below command – the AppArchivePath value can be chosen arbitrarily:

$ tridentctl-protect create resourcebackup --app demo --appvault pu-demo --app-archive-path demo_hook_enabled/resourcebackups/my_res_backup -n demo2
ResourceBackup "demo-kt7iv1" created.
 
$ kubernetes -n demo2 get ResourceBackup demo-kt7iv1
NAME          STATE       ERROR   AGE
demo-kt7iv1   Completed           45s

With the AWS CLI, we can check the content of the S3 bucket and the ResourceBackup:

$ aws s3 ls s3://pu-demo
                           PRE demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/
                           PRE demo_hook_enabled/
2025-04-23 14:05:06         39 appVault.json
 
$ aws s3 ls s3://pu-demo/demo_hook_enabled/ --recursive
2025-04-30 17:45:20       1049 demo_hook_enabled/resourcebackups/my_res_backup/application.json
2025-04-30 17:45:20       4860 demo_hook_enabled/resourcebackups/my_res_backup/exec_hooks.json
2025-04-30 17:45:20       2523 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup.json
2025-04-30 17:45:22      11138 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup.tar.gz
2025-04-30 17:45:22       6235 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup_summary.json

Finally, we create a manual ExechooksRun using the Trident protect CLI:

$ tridentctl-protect create exechooksrun --action restore --stage post --app demo --appvault pu-demo --path demo_hook_enabled/resourcebackups/my_res_backup -n demo2
ExecHooksRun "ehr-m5w8tj" created.

The execution hook run finishes in a couple of seconds, and we can confirm its successful run in the alpine-tp-hook container:

$ kubectl  -n demo2 get ExecHooksRun ehr-m5w8tj
NAME         STATE       STAGE   ACTION    ERROR   APP    AGE
ehr-m5w8tj   Completed   Post    Restore           demo   18s

$ kubectl -n demo2 describe ExecHooksRun ehr-m5w8tj
Name:         ehr-m5w8tj
Namespace:    demo2
Labels:       <none>
Annotations:  <none>
API Version:  protect.trident.netapp.io/v1
Kind:         ExecHooksRun
Metadata:
  Creation Timestamp:  2025-04-30T15:47:57Z
  Finalizers:
    protect.trident.netapp.io/delete-jobs
  Generation:        1
  Resource Version:  356207
  UID:               9e1240ed-fa95-44c3-b50e-bdf45cdba876
Spec:
  Action:              Restore
  App Archive Path:    demo_hook_enabled/resourcebackups/my_res_backup
  App Vault Ref:       pu-demo
  Application Ref:     demo
  Completion Timeout:  0s
  Resource Filter:
  Stage:  Post
Status:
  Completion Timeout:    30m0s
  Completion Timestamp:  2025-04-30T15:48:01Z
  Conditions:
    Last Transition Time:  2025-04-30T15:47:57Z
    Message:               Found 1 matching container/exechook pairs
    Reason:                Done
    Status:                True
    Type:                  RetrievedMatchingContainers
    Last Transition Time:  2025-04-30T15:47:57Z
    Message:               Successfully reconciled
    Reason:                Done
    Status:                True
    Type:                  WaitForReadiness
    Last Transition Time:  2025-04-30T15:48:01Z
    Message:               Successfully reconciled
    Reason:                Done
    Status:                True
    Type:                  ProcessMatchingContainers
    Last Transition Time:  2025-04-30T15:48:01Z
    Message:               Successfully reconciled
    Reason:                Done
    Status:                True
    Type:                  Completed
  Matching Containers:
    Completion Timestamp:  2025-04-30T15:48:01Z
    Container Image:       467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
    Container Name:        alpine-tp-hook
    Exec Hook Ref:         post-restore-url-rewrite
    Exec Hook UID:         090571f6-fb12-48ce-b205-8f9e9d08a947
    Job Name:              ehr-74c2b8641add0aba8742b5e2a046ff36
    Namespace:             demo2
    Pod Name:              tp-hook-deployment-6cbdc996f-fp6z7
    Pod UID:               f483a0ff-88bf-4010-b75e-81560f2b6fa6
    Start Timestamp:       2025-04-30T15:47:57Z
  State:                   Completed
Events:                    <none>

Lastly, let’s confirm that the image was now actually pulled from the repository on the DR site:

$ kubectl get all,pvc -n demo2
NAME                                      READY   STATUS    RESTARTS   AGE
pod/demo-deployment-5df8dcc7df-ttjrv      1/1     Running   0          99s
pod/tp-hook-deployment-66bf969f5c-hg5xx   1/1     Running   0          99s
 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/demo-service   LoadBalancer   172.20.31.179   <pending>     80:31132/TCP   35m
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo-deployment      1/1     1            1           35m
deployment.apps/tp-hook-deployment   1/1     1            1           35m
 
NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-deployment-5df8dcc7df      1         1         1       100s
replicaset.apps/demo-deployment-7bd7d8f4cf      0         0         0       35m
replicaset.apps/tp-hook-deployment-66bf969f5c   1         1         1       100s
replicaset.apps/tp-hook-deployment-6cbdc996f    0         0         0       35m
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-e81cae8b-9acc-4cf8-bb0b-bbd6d649d7df   2Gi        RWX            fsx-netapp-file   <unset>                 35m
 
$ kubectl describe pod/demo-deployment-5df8dcc7df-ttjrv -n demo2 | grep Image:
    Image:          467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest

Depending on your requirements, you may want to disable the execution hook of the restored application again:

$ kubectl patch execHook post-restore-url-rewrite -n demo2 --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": false}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
 
$ kubectl -n demo2 get exechooks
NAME                       STAGE   ACTION    APP    ENABLED   ERROR   AGE
post-restore-url-rewrite   Post    Restore   demo   false             37m

Conclusion

In certain scenarios, it’s crucial to change K8s application definitions after a restore. With its execution hooks framework, Trident protect offers custom actions that can be configured to run in conjunction with a data protection operation of a managed app.

 

Trident protect supports the following types of execution hooks, based on when they can be run:

  • Pre-snapshot
  • Post-snapshot
  • Pre-backup
  • Post-backup
  • Post-restore
  • Post-failover

The Verda GitHub project contains a collection of example execution hooks for various applications and scenarios.

 

In this blog post we showed how to leverage execution hooks to change the image URL of container images after an application restore to a DR site with a different repository URL by following the sample post-restore URL-rewrite hook in Verda, how to disable execution hooks, and how to execution hooks manually. The same mechanisms can for example be used to change an Ingress configuration after a restore.

Public