Tech ONTAP Blogs
Tech ONTAP Blogs
Disaster recovery for business-critical Kubernetes applications often requires using replicated private registries to pull the container images locally on the DR site in case of a complete failure (including the registry) of the primary site. This can be the case for both on-premises and cloud-based Kubernetes deployments. Therefore, it’s essential for the backup system used to protect these critical Kubernetes applications to have the ability to modify Kubernetes configurations after a restore. That’s also important for other aspects that might need to be changed on the DR site, like ingress configuration.
NetApp® Trident™ protect provides application-aware data protection, mobility, and disaster recovery for any workload running on any Kubernetes distribution, leveraging NetApp’s proven and expansive storage portfolio in the public cloud and on premises. Trident protect enables administrators to easily protect, back up, migrate, and create working clones of Kubernetes applications, through either its CLI or Kubernetes-native custom resource definitions (CRDs).
Trident protect offers various types of execution hooks—custom scripts that you can configure to run in conjunction with a data protection operation of a managed app. With a post-restore hook, you can for example change the container image URL after an application restore to a DR site. Read on to find out how.
In this blog, we use the post-restore URL rewrite hook example with Amazon Elastic Container Registry (ECR) cross-region replication (CRR) to demonstrate how to restore an NGINX sample application. The sample application is originally running on an Amazon Elastic Kubernetes Service (EKS) cluster in the eu-west-1 region and shall be failed over to a DR cluster in the eu-north-1 region in case of a disaster. The NGINX container image is to be pulled from private image repositories in the respective regions.
After creating a private registry in Amazon Web Services (AWS), we followed the steps in the AWS documentation to configure private image replication from eu-west-1 to the eu-north-1 region:
$ aws ecr describe-registry –-region eu-west-1
{
"registryId": "467886448844",
"replicationConfiguration": {
"rules": [
{
"destinations": [
{
"region": "eu-north-1",
"registryId": "467886448844"
}
]
}
]
}
}
Now all content pushed to repositories in eu-west-1 is automatically replicated to eu-north-1. Amazon ECR keeps the destination and source synchronized.
First, we create a private Amazon ECR repository nginx to store the NGINX container image in the eu-west-1 region in the AWS console:
Figure 1) Amazon ECR repository for nginx.
We take note of the push command for the repository:
Figure 2) Push commands for the nginx repository.
We have already pulled the nginx image to our local repository:
$ docker pull nginx:latest
latest: Pulling from library/nginx
3ae0c06b4d3a: Pull complete
efe5035ea617: Pull complete
a9b1bd25c37b: Pull complete
f853dda6947e: Pull complete
38f44e054f7b: Pull complete
ed88a19ddb46: Pull complete
495e6abbed48: Pull complete
Digest: sha256:08bc36ad52474e528cc1ea3426b5e3f4bad8a130318e3140d6cfe29c8892c7ef
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest
So, we can push it to the Amazon ECR repository:
$ aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 467886448844.dkr.ecr.eu-west-1.amazonaws.com
Login Succeeded
Logging in with your password grants your terminal complete access to your account.
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/
$ docker tag nginx:latest 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
$ docker push 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
The push refers to repository [467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx]
9e96226c58e7: Pushed
12a568acc014: Pushed
7757099e19d2: Pushed
bf8b62fb2f13: Pushed
4ca29ffc4a01: Pushed
a83110139647: Pushed
ac4d164fef90: Pushed
latest: digest: sha256:d2b2f2980e9ccc570e5726b56b54580f23a018b7b7314c9eaff7e5e479c78657 size: 1778
Using the AWS CLI, we can find the repositoryUri of the repository in the eu-west-1 region:
$ aws ecr describe-repositories –-region eu-west-1
{
"repositories": [
{
"repositoryArn": "arn:aws:ecr:eu-west-1:467886448844:repository/nginx",
"registryId": "467886448844",
"repositoryName": "nginx",
"repositoryUri": "467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx",
"createdAt": "2023-06-20T11:37:12+00:00",
"imageTagMutability": "MUTABLE",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
}
}
]
}
(URI: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx)
Amazon ECR automatically created the nginx repository on the DR site due to the configured replication:
$ aws ecr describe-repositories --region eu-north-1
{
"repositories": [
{
"repositoryArn": "arn:aws:ecr:eu-north-1:467886448844:repository/nginx",
"registryId": "467886448844",
"repositoryName": "nginx",
"repositoryUri": "467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx",
"createdAt": "2023-06-20T14:09:02+02:00",
"imageTagMutability": "MUTABLE",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
}
}
]
}
And then automatically replicated the nginx image to the DR site eu-north-1:
$ aws ecr list-images --repository-name nginx --region eu-north-1
{
"imageIds": [
{
"imageDigest": "sha256:d2b2f2980e9ccc570e5726b56b54580f23a018b7b7314c9eaff7e5e479c78657",
"imageTag": "latest"
}
]
}
Note that the repository URI on the DR site is different from the URI on the primary site: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest. Therefore, in the event of a complete disaster at the primary site that also disables the private registry, we must make sure that the container images are pulled from the DR site’s repository. Otherwise, the applications will not start.
Now we can deploy the demo application on the EKS cluster on the primary site using the following manifest, which installs an NGINX deployment and a PV backed by Amazon FSx for NetApp ONTAP into the namespace demo. The NGINX container image is pulled from the Amazon ECR repository in the eu-west-1 region:
$ cat sample-app.yaml
apiVersion: v1
kind: Namespace
metadata:
name: demo
---
apiVersion: v1
kind: Service
metadata:
name: demo-service
namespace: demo
labels:
app: demo
spec:
ports:
- port: 80
selector:
app: demo
tier: frontend
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment
namespace: demo
labels:
app: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
tier: frontend
template:
metadata:
labels:
app: demo
tier: frontend
spec:
containers:
- image: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
imagePullPolicy: Always
name: demo
ports:
- containerPort: 80
name: demo
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: nginxdata
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginxdata
namespace: demo
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: fsx-netapp-file
$ kubectl apply -f sample-app.yaml
namespace/demo created
service/demo-service created
deployment.apps/demo-deployment created
persistentvolumeclaim/nginxdata created
We check that the deployment was successful:
$ kubectl get all,pvc -n demo
NAME READY STATUS RESTARTS AGE
pod/demo-deployment-7bd7d8f4cf-5dh89 1/1 Running 0 106s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-service LoadBalancer 172.20.56.137 <pending> 80:31099/TCP 107s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/demo-deployment 1/1 1 1 106s
NAME DESIRED CURRENT READY AGE
replicaset.apps/demo-deployment-7bd7d8f4cf 1 1 1 106s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/nginxdata Bound pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18 2Gi RWX fsx-netapp-file <unset> 107s
And that the NGINX container image was pulled from the correct repository:
$ kubectl -n demo describe pod/demo-deployment-7bd7d8f4cf-5dh89 | grep Image:
Image: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
The EKS cluster eks-demo1-euwest1-cluster on which we deployed the demo application has Trident protect and its CLI tridentctl-protect already installed and an AWS S3 bucket is configured as an AppVault CR pu-demo to store application backup data. Therefore, we can manage the demo application with Trident protect simply by defining its namespace as an application named demo in Trident protect.
$ tridentctl-protect create app demo --namespaces demo -n demo
Application "demo" created.
$ tridentctl-protect get application -A
+-----------+------+------------+-------+-----+
| NAMESPACE | NAME | NAMESPACES | STATE | AGE |
+-----------+------+------------+-------+-----+
| demo | demo | demo | Ready | 6s |
+-----------+------+------------+-------+-----+
To regularly protect the demo application, we create a protection schedule with hourly backups to the AWS S3 bucket:
$ tridentctl-protect create schedule --app demo --appvault pu-demo --backup-retention 3 --granularity Hourly --minute 30 --snapshot-retention 3 -n demo
Schedule "demo-q5bi0f" created.
$ tridentctl-protect get schedule -n demo
+-------------+------+---------------+---------+-------+-------+-----+
| NAME | APP | SCHEDULE | ENABLED | STATE | ERROR | AGE |
+-------------+------+---------------+---------+-------+-------+-----+
| demo-q5bi0f | demo | Hourly:min=30 | true | | | 8s |
+-------------+------+---------------+---------+-------+-------+-----+
To change the container image URL from region eu-west-1 to region eu-north-1 after a restore, we use a modified post-restore hook from our collection of example execution hooks in the Verda GitHub project. There, the post-restore URL-rewrite hook can be adapted for our purposes. It consists of two parts: the actual post-restore execution hook script url-rewrite.sh, which swaps all container images between two regions when invoked, and a hook execution container definition rewrite-infra.yaml, which we need to modify to fit into our environment.
Because the url-rewrite.sh execution hook needs to run in a container with the K8s CLI installed, the helper tool deploys a generic Alpine container and installs the K8s CLI. It also installs a ServiceAccount and RoleBinding with the necessary permissions in the application namespace.
To adapt and deploy the hook components, we clone the Verda GitHub repository and change into the Verda/URL-rewrite directory:
$ git clone https://github.com/NetApp/Verda.git
Cloning into 'Verda'...
remote: Enumerating objects: 317, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 317 (delta 54), reused 30 (delta 30), pack-reused 233 (from 1)
Receiving objects: 100% (317/317), 88.24 KiB | 1.76 MiB/s, done.
Resolving deltas: 100% (151/151), done.
$ cd Verda/URL-rewrite
First, we need to adapt the manifest for the helper tools to our sample application. We make sure that the namespace values are set to the namespace demo of the sample app and that the labels fit our application needs:
$ cat rewrite-infra.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubectl-ns-admin-sa
namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubectl-ns-admin-sa
namespace: demo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: kubectl-ns-admin-sa
namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tp-hook-deployment
namespace: demo
labels:
app: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
serviceAccountName: kubectl-ns-admin-sa
containers:
- name: alpine-tp-hook
image: alpine:latest
env:
- name: KUBECTL_VERSION
value: "1.23.9"
command: ["/bin/sh"]
args:
- -c
- >
apk add curl jq &&
curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
mv kubectl /usr/bin/kubectl &&
chmod +x /usr/bin/kubectl &&
trap : TERM INT; sleep infinity & wait
Assuming that we’re not allowed to use a public repository for the Alpine image, we also create a (replicated) private repository for the Alpine image. We push it there from our local repository, following the same steps as above for the NGINX image:
$ docker push 467886448844.dkr.ecr.eu-west-1.amazonaws.com/alpine:latest
The push refers to repository [467886448844.dkr.ecr.eu-west-1.amazonaws.com/alpine]
78a822fe2a2d: Pushed
latest: digest: sha256:25fad2a32ad1f6f510e528448ae1ec69a28ef81916a004d3629874104f8a7f70 size: 528
We find the repositoryUri of the alpine ECR repository on the DR site eu-north-1 is 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:
$ aws ecr describe-repositories --region eu-north-1 --repository-name alpine
{
"repositories": [
{
"repositoryArn": "arn:aws:ecr:eu-north-1:467886448844:repository/alpine",
"registryId": "467886448844",
"repositoryName": "alpine",
"repositoryUri": "467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine",
"createdAt": "2023-06-26T12:54:40+02:00",
"imageTagMutability": "MUTABLE",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
}
}
]
}
And we confirm that the Alpine image was replicated successfully to the DR site:
$ cat rewrite-infra-ECR-DR.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubectl-ns-admin-sa
namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubectl-ns-admin-sa
namespace: demo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: kubectl-ns-admin-sa
namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tp-hook-deployment
namespace: demo
labels:
app: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
serviceAccountName: kubectl-ns-admin-sa
containers:
- name: alpine-tp-hook
image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
env:
- name: KUBECTL_VERSION
value: "1.23.9"
command: ["/bin/sh"]
args:
- -c
- >
apk add curl jq &&
curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
mv kubectl /usr/bin/kubectl &&
chmod +x /usr/bin/kubectl &&
trap : TERM INT; sleep infinity & wait
We can now deploy the hook components into the namespace of the sample application and confirm that the helper pod is running:
$ kubectl apply -f rewrite-infra-ECR-DR.yaml
serviceaccount/kubectl-ns-admin-sa created
rolebinding.rbac.authorization.k8s.io/kubectl-ns-admin-sa created
deployment.apps/tp-hook-deployment created
$ kubectl get all,pvc -n demo
NAME READY STATUS RESTARTS AGE
pod/demo-deployment-7bd7d8f4cf-5dh89 1/1 Running 0 43m
pod/tp-hook-deployment-6cbdc996f-pprdc 1/1 Running 0 11s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-service LoadBalancer 172.20.56.137 <pending> 80:31099/TCP 43m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/demo-deployment 1/1 1 1 43m
deployment.apps/tp-hook-deployment 1/1 1 1 11s
NAME DESIRED CURRENT READY AGE
replicaset.apps/demo-deployment-7bd7d8f4cf 1 1 1 43m
replicaset.apps/tp-hook-deployment-6cbdc996f 1 1 1 11s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/nginxdata Bound pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18 2Gi RWX fsx-netapp-file <unset> 43m
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
volumesnapshot.snapshot.storage.k8s.io/snapshot-01f9facb-580f-4b25-9d49-e149c55fd540-pvc-34fe0bf5-5a7f-49a9-bd46-f8af03a61f18 true nginxdata 312Ki csi-trident-snapclass snapcontent-1b0794ca-9ad4-427a-9f5d-c7b196dfef33 21m 21m
After confirming that the Alpine image was pulled from the private ECR repository on the DR site, we’re all set to add the post-restore URL-rewrite hook to the sample application in ACS:
$ kubectl -n demo describe pod/tp-hook-deployment-6cbdc996f-pprdc | grep Image:
Image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
In the next step, we create the post-restore hook post-restore-url-rewrite with these details:
With these details, the CLI command to create the execution hook in the demo namespace is:
$ tridentctl-protect create exechook post-restore-url-rewrite --action Restore --stage Post --app demo --source-file ./url-rewrite.sh --arg 467886448844.dkr.ecr.eu-west-1.amazonaws.com --arg 467886448844.dkr.ecr.eu-north-1.amazonaws.com --match containerName:alpine-tp-hook -n demo
ExecHook "post-restore-url-rewrite" created.
We verify that the hook was create successfully and is in the ENABLED state:
$ tridentctl-protect get exechook -n demo
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+
| NAME | APP | MATCH | ACTION | STAGE | ENABLED | ERROR | AGE |
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+
| post-restore-url-rewrite | demo | containerName:alpine-tp-hook | Restore | Post | true | | 47s |
+--------------------------+------+------------------------------+---------+-------+---------+-------+-----+
Now we can test application restores to the DR site eu-north-1 and locally to the same cluster.
To test a restore to the DR site, we use a second cluster eks-demo2-eunorth1-cluster in the DR location eu-north-1 with Trident protect installed on it. The DR cluster has an AppVault CR pu-demo configured that points to the same AWS S3 bucket pu-demo hosting the backups taken from the primary cluster:
$ tridentctl-protect get appvault
+---------+----------+-----------+-------+---------+-----+
| NAME | PROVIDER | STATE | ERROR | MESSAGE | AGE |
+---------+----------+-----------+-------+---------+-----+
| pu-demo | AWS | Available | | | 6s |
+---------+----------+-----------+-------+---------+-----+
We can list the available application backups in the S3 bucket from the DR cluster:
$ tridentctl-protect get appvaultcontent pu-demo --show-paths --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+
| CLUSTER | APP | TYPE | NAME | NAMESPACE | TIMESTAMP | PATH |
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424053000 | demo | 2025-04-24 05:31:38 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424053000_5ec3eebc-62cf-4fb4-b6aa-96c37985692d |
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424063000 | demo | 2025-04-24 06:31:42 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424063000_e37cbdcc-680a-44a6-9a7f-9c5d4eb38a18 |
| demo1-euwest1 | demo | backup | hourly-bdd12-20250424073000 | demo | 2025-04-24 07:31:37 (UTC) | demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424073000_3e294f60-67f3-4a9e-87bd-2ea64a91be63 |
+---------------+------+--------+-----------------------------+-----------+---------------------------+--------------------------------------------------------------------------------------------------------------------+
We choose to restore from the most recent backup hourly-bdd12-20250424073000 and start the restore on the DR cluster (into the original application namespace demo) by specifying the backup path in the archive on the CLI command and check the progress of the restore:
$ tridentctl-protect create backuprestore --appvault pu-demo --path demo_b5714a51-7b90-49d2-941d-7526439a2341/backups/hourly-bdd12-20250424073000_3e294f60-67f3-4a9e-87bd-2ea64a91be63 --namespace-mapping demo:demo --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster -n demo
BackupRestore "demo-a4xvta" created.
$ kubectl -n demo get BackupRestore demo-a4xvta -w
NAME STATE ERROR AGE
demo-a4xvta Running 14s
demo-a4xvta Running 26s
demo-a4xvta Running 26s
demo-a4xvta Running 37s
demo-a4xvta Running 37s
demo-a4xvta Running 37s
demo-a4xvta Running 37s
demo-a4xvta Running 37s
demo-a4xvta Running 46s
demo-a4xvta Running 46s
demo-a4xvta Running 46s
demo-a4xvta Running 46s
demo-a4xvta Running 61s
demo-a4xvta Running 61s
demo-a4xvta Running 70s
demo-a4xvta Running 70s
demo-a4xvta Running 71s
demo-a4xvta Running 71s
demo-a4xvta Running 71s
demo-a4xvta Completed 71s
The demo app comes up on the DR cluster eks-demo2-eunorth1-cluster:
$ kubectl get all,pvc -n demo --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
NAME READY STATUS RESTARTS AGE
pod/demo-deployment-5df8dcc7df-79t6m 1/1 Running 0 4h7m
pod/tp-hook-deployment-66bf969f5c-pln4q 1/1 Running 0 4h7m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-service LoadBalancer 172.20.94.103 <pending> 80:31013/TCP 4h7m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/demo-deployment 1/1 1 1 4h7m
deployment.apps/tp-hook-deployment 1/1 1 1 4h7m
NAME DESIRED CURRENT READY AGE
replicaset.apps/demo-deployment-5df8dcc7df 1 1 1 4h7m
replicaset.apps/demo-deployment-7bd7d8f4cf 0 0 0 4h7m
replicaset.apps/tp-hook-deployment-66bf969f5c 1 1 1 4h7m
replicaset.apps/tp-hook-deployment-6cbdc996f 0 0 0 4h7m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/nginxdata Bound pvc-37a929d9-34d8-443d-a230-8eee73e02fe3 2Gi RWX fsx-netapp-file <unset> 4h7m
The detailed steps of the execution hook run can be seen in the ExechooksRun CR that’s created during the restore job run:
$ kubectl -n demo get exechooksrun --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster
NAME STATE STAGE ACTION ERROR APP AGE
post-restore-demo-a4xvta Completed Post Restore demo 4h11m
$ kubectl -n demo get exechooksrun post-restore-demo-a4xvta --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster -o yaml | yq '.status'
completionTimeout: 30m0s
completionTimestamp: "2025-04-24T08:29:08Z"
conditions:
- lastTransitionTime: "2025-04-24T08:29:01Z"
message: Found 1 matching container/exechook pairs
reason: Done
status: "True"
type: RetrievedMatchingContainers
- lastTransitionTime: "2025-04-24T08:29:01Z"
message: Successfully reconciled
reason: Done
status: "True"
type: WaitForReadiness
- lastTransitionTime: "2025-04-24T08:29:08Z"
message: Successfully reconciled
reason: Done
status: "True"
type: ProcessMatchingContainers
- lastTransitionTime: "2025-04-24T08:29:08Z"
message: Successfully reconciled
reason: Done
status: "True"
type: Completed
matchingContainers:
- completionTimestamp: "2025-04-24T08:29:08Z"
containerImage: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
containerName: alpine-tp-hook
execHookRef: post-restore-url-rewrite
execHookUID: afd07266-bd1a-4d7c-bb0d-1f436b5c36c3
jobName: ehr-fc46237de1c8444a199d4800cda4b165
namespace: demo
podName: tp-hook-deployment-6cbdc996f-4q5mq
podUID: fb14ed00-17a5-46b0-9662-0a966c514c37
startTimestamp: "2025-04-24T08:29:01Z"
state: Completed
Checking the container image of the sample application, we see that it was indeed pulled from the nginx ECR repository on the DR site eu-north-1:
$ kubectl -n demo describe pod/demo-deployment-5df8dcc7df-79t6m --context arn:aws:eks:eu-north-1:467886448844:cluster/eks-demo2-eunorth1-cluster | grep Image:
Image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest
When doing a local restore, either to the same cluster eks-demo1-euwest1-cluster or to another cluster in the primary location, the post-restore URL-rewrite hook would also be executed. If pulling the container images from the DR site registry for local restores is not wanted or possible, you can either add additional logic to the hook preventing the change of image location during a local restore or disable the post-restore URL-rewrite hook and execute it manually only when needed after a restore. We show the procedure to disable the post-restore hook and execute it manually below.
To disable the post-restore-url-rewrite hook on the primary cluster after its creation, we patch it and set the /spec/enabled value to false:
$ kubectl -n demo get exechooks
NAME STAGE ACTION APP ENABLED ERROR AGE
post-restore-url-rewrite Post Restore demo true 17h
$ kubectl patch execHook post-restore-url-rewrite -n demo --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": false}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
$ kubectl -n demo get exechooks
NAME STAGE ACTION APP ENABLED ERROR AGE
post-restore-url-rewrite Post Restore demo false 17h
For any subsequent backups, the disabled state of the hook will be preserved in the backups and the hook won't be executed upon restore. Let’s test it and run an on-demand backup named hook-disabled:
$ tridentctl-protect create backup hook-disabled --app demo --appvault pu-demo -n demo
Backup "hook-disabled" created.
Now we restore from this backup into a new namespace demo2 after finding the backup path in the AppVault:
$ tridentctl-protect get appvaultcontent pu-demo --show-paths
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
| CLUSTER | APP | TYPE | NAME | NAMESPACE | TIMESTAMP | PATH |
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
| demo1-euwest1 | demo | backup | hook-disabled | demo | 2025-04-30 15:08:31 (UTC) | demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/backups/hook-disabled_916c0058-4436-483e-bf94-c3ad541449ac |
+---------------+------+--------+---------------+-----------+---------------------------+------------------------------------------------------------------------------------------------------+
$ tridentctl-protect create backuprestore --appvault pu-demo --path demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/backups/hook-disabled_916c0058-4436-483e-bf94-c3ad541449ac --namespace-mapping demo:demo2 -n demo2
BackupRestore "demo-jusg1t" created.
The restore completes after some minutes and once the application is up and running, we can confirm that the image location was not changed, and the container images were pulled from the ECR repository on the primary site:
$ kubectl get all,pvc -n demo2
NAME READY STATUS RESTARTS AGE
pod/demo-deployment-7bd7d8f4cf-qxqp9 1/1 Running 0 49s
pod/tp-hook-deployment-6cbdc996f-fp6z7 1/1 Running 0 49s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-service LoadBalancer 172.20.31.179 <pending> 80:31132/TCP 49s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/demo-deployment 1/1 1 1 49s
deployment.apps/tp-hook-deployment 1/1 1 1 49s
NAME DESIRED CURRENT READY AGE
replicaset.apps/demo-deployment-7bd7d8f4cf 1 1 1 49s
replicaset.apps/tp-hook-deployment-6cbdc996f 1 1 1 49s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/nginxdata Bound pvc-e81cae8b-9acc-4cf8-bb0b-bbd6d649d7df 2Gi RWX fsx-netapp-file <unset> 51s
$ kubectl -n demo2 describe pod/demo-deployment-7bd7d8f4cf-qxqp9 | grep Image:
Image: 467886448844.dkr.ecr.eu-west-1.amazonaws.com/nginx:latest
A “Found 0 matching container/exechook pairs” message in the status of the execution hook run confirms that the hook was not executed:
$ kubectl -n demo2 get exechooksrun
NAME STATE STAGE ACTION ERROR APP AGE
post-restore-demo-jusg1t Completed Post Restore demo 2m44s
$ kubectl -n demo2 describe exechooksrun post-restore-demo-jusg1t
Name: post-restore-demo-jusg1t
Namespace: demo2
Labels: <none>
Annotations: protect.trident.netapp.io/correlationid: dc8ab4d2-11ad-4d26-a69c-785637ae6501
API Version: protect.trident.netapp.io/v1
Kind: ExecHooksRun
Metadata:
Creation Timestamp: 2025-04-30T15:14:21Z
Finalizers:
protect.trident.netapp.io/delete-jobs
Generation: 1
Owner References:
API Version: protect.trident.netapp.io/v1
Block Owner Deletion: true
Controller: true
Kind: BackupRestore
Name: demo-jusg1t
UID: 480b9773-2f53-46e8-9a0c-ee63e78cbf41
Resource Version: 348037
UID: be1ace14-b6f4-4cc3-8864-0d43da59b85d
Spec:
Action: Restore
App Archive Path: demo_cbfb8846-fb7a-4b4c-b583-183ece3a89d5/resourcebackups/protect-resource-backup-480b9773-2f53-46e8-9a0c-ee63e78cbf41_424312dc-562e-48c9-b78c-abad75cb0e52
App Vault Ref: pu-demo
Application Ref: demo
Completion Timeout: 0s
Resource Filter:
Stage: Post
Status:
Completion Timeout: 0s
Completion Timestamp: 2025-04-30T15:14:21Z
Conditions:
Last Transition Time: 2025-04-30T15:14:21Z
Message: Found 0 matching container/exechook pairs
Reason: Done
Status: True
Type: RetrievedMatchingContainers
Last Transition Time: 2025-04-30T15:14:21Z
Message: No matching containers to wait for
Reason: Done
Status: True
Type: WaitForReadiness
Last Transition Time: 2025-04-30T15:14:21Z
Message: Successfully reconciled
Reason: Done
Status: True
Type: ProcessMatchingContainers
Last Transition Time: 2025-04-30T15:14:21Z
Message: Successfully reconciled
Reason: Done
Status: True
Type: Completed
Matching Containers:
State: Completed
Events: <none>
To execute the post-restore-url-rewrite hook manually after the restore of the application into the demo2 namespace, we need to re-enable the hook in the demo2 namespace, take a backup of only the application resource (without the persistent volumes) and the create and ExechooksRun CR to run the hook.
Let’s enable the hook by patching it again:
$ kubectl -n demo2 get exechooks
NAME STAGE ACTION APP ENABLED ERROR AGE
post-restore-url-rewrite Post Restore demo false 23h
$ kubectl patch execHook post-restore-url-rewrite -n demo2 --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": true}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
$ kubectl -n demo2 get exechooks
NAME STAGE ACTION APP ENABLED ERROR AGE
post-restore-url-rewrite Post Restore demo true 23h
Now we create a ResourceBackup with the below command – the AppArchivePath value can be chosen arbitrarily:
$ tridentctl-protect create resourcebackup --app demo --appvault pu-demo --app-archive-path demo_hook_enabled/resourcebackups/my_res_backup -n demo2
ResourceBackup "demo-kt7iv1" created.
$ kubernetes -n demo2 get ResourceBackup demo-kt7iv1
NAME STATE ERROR AGE
demo-kt7iv1 Completed 45s
With the AWS CLI, we can check the content of the S3 bucket and the ResourceBackup:
$ aws s3 ls s3://pu-demo
PRE demo_2f2472e5-be17-40e3-a04b-166f27cb61f3/
PRE demo_hook_enabled/
2025-04-23 14:05:06 39 appVault.json
$ aws s3 ls s3://pu-demo/demo_hook_enabled/ --recursive
2025-04-30 17:45:20 1049 demo_hook_enabled/resourcebackups/my_res_backup/application.json
2025-04-30 17:45:20 4860 demo_hook_enabled/resourcebackups/my_res_backup/exec_hooks.json
2025-04-30 17:45:20 2523 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup.json
2025-04-30 17:45:22 11138 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup.tar.gz
2025-04-30 17:45:22 6235 demo_hook_enabled/resourcebackups/my_res_backup/resource_backup_summary.json
Finally, we create a manual ExechooksRun using the Trident protect CLI:
$ tridentctl-protect create exechooksrun --action restore --stage post --app demo --appvault pu-demo --path demo_hook_enabled/resourcebackups/my_res_backup -n demo2
ExecHooksRun "ehr-m5w8tj" created.
The execution hook run finishes in a couple of seconds, and we can confirm its successful run in the alpine-tp-hook container:
$ kubectl -n demo2 get ExecHooksRun ehr-m5w8tj
NAME STATE STAGE ACTION ERROR APP AGE
ehr-m5w8tj Completed Post Restore demo 18s
$ kubectl -n demo2 describe ExecHooksRun ehr-m5w8tj
Name: ehr-m5w8tj
Namespace: demo2
Labels: <none>
Annotations: <none>
API Version: protect.trident.netapp.io/v1
Kind: ExecHooksRun
Metadata:
Creation Timestamp: 2025-04-30T15:47:57Z
Finalizers:
protect.trident.netapp.io/delete-jobs
Generation: 1
Resource Version: 356207
UID: 9e1240ed-fa95-44c3-b50e-bdf45cdba876
Spec:
Action: Restore
App Archive Path: demo_hook_enabled/resourcebackups/my_res_backup
App Vault Ref: pu-demo
Application Ref: demo
Completion Timeout: 0s
Resource Filter:
Stage: Post
Status:
Completion Timeout: 30m0s
Completion Timestamp: 2025-04-30T15:48:01Z
Conditions:
Last Transition Time: 2025-04-30T15:47:57Z
Message: Found 1 matching container/exechook pairs
Reason: Done
Status: True
Type: RetrievedMatchingContainers
Last Transition Time: 2025-04-30T15:47:57Z
Message: Successfully reconciled
Reason: Done
Status: True
Type: WaitForReadiness
Last Transition Time: 2025-04-30T15:48:01Z
Message: Successfully reconciled
Reason: Done
Status: True
Type: ProcessMatchingContainers
Last Transition Time: 2025-04-30T15:48:01Z
Message: Successfully reconciled
Reason: Done
Status: True
Type: Completed
Matching Containers:
Completion Timestamp: 2025-04-30T15:48:01Z
Container Image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/alpine:latest
Container Name: alpine-tp-hook
Exec Hook Ref: post-restore-url-rewrite
Exec Hook UID: 090571f6-fb12-48ce-b205-8f9e9d08a947
Job Name: ehr-74c2b8641add0aba8742b5e2a046ff36
Namespace: demo2
Pod Name: tp-hook-deployment-6cbdc996f-fp6z7
Pod UID: f483a0ff-88bf-4010-b75e-81560f2b6fa6
Start Timestamp: 2025-04-30T15:47:57Z
State: Completed
Events: <none>
Lastly, let’s confirm that the image was now actually pulled from the repository on the DR site:
$ kubectl get all,pvc -n demo2
NAME READY STATUS RESTARTS AGE
pod/demo-deployment-5df8dcc7df-ttjrv 1/1 Running 0 99s
pod/tp-hook-deployment-66bf969f5c-hg5xx 1/1 Running 0 99s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-service LoadBalancer 172.20.31.179 <pending> 80:31132/TCP 35m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/demo-deployment 1/1 1 1 35m
deployment.apps/tp-hook-deployment 1/1 1 1 35m
NAME DESIRED CURRENT READY AGE
replicaset.apps/demo-deployment-5df8dcc7df 1 1 1 100s
replicaset.apps/demo-deployment-7bd7d8f4cf 0 0 0 35m
replicaset.apps/tp-hook-deployment-66bf969f5c 1 1 1 100s
replicaset.apps/tp-hook-deployment-6cbdc996f 0 0 0 35m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/nginxdata Bound pvc-e81cae8b-9acc-4cf8-bb0b-bbd6d649d7df 2Gi RWX fsx-netapp-file <unset> 35m
$ kubectl describe pod/demo-deployment-5df8dcc7df-ttjrv -n demo2 | grep Image:
Image: 467886448844.dkr.ecr.eu-north-1.amazonaws.com/nginx:latest
Depending on your requirements, you may want to disable the execution hook of the restored application again:
$ kubectl patch execHook post-restore-url-rewrite -n demo2 --type='json' -p='[{"op": "replace", "path": "/spec/enabled", "value": false}]'
exechook.protect.trident.netapp.io/post-restore-url-rewrite patched
$ kubectl -n demo2 get exechooks
NAME STAGE ACTION APP ENABLED ERROR AGE
post-restore-url-rewrite Post Restore demo false 37m
In certain scenarios, it’s crucial to change K8s application definitions after a restore. With its execution hooks framework, Trident protect offers custom actions that can be configured to run in conjunction with a data protection operation of a managed app.
Trident protect supports the following types of execution hooks, based on when they can be run:
The Verda GitHub project contains a collection of example execution hooks for various applications and scenarios.
In this blog post we showed how to leverage execution hooks to change the image URL of container images after an application restore to a DR site with a different repository URL by following the sample post-restore URL-rewrite hook in Verda, how to disable execution hooks, and how to execution hooks manually. The same mechanisms can for example be used to change an Ingress configuration after a restore.