Tech ONTAP Blogs

Rescale Kubernetes applications with Trident protect post-restore hooks

PatricU
NetApp
1,236 Views

Introduction

Cloning a K8s application for testing purposes or restoring it for disaster recovery may require scaling down (or up) the number of replicas to accommodate the available resources or performance requirements.

This can be the case for both on-premises and cloud-based Kubernetes deployments. Therefore, it’s essential for the data management and backup system used to protect the Kubernetes applications to have the ability to modify Kubernetes configurations after a restore or clone operation. That’s also important for other aspects that might need to be changed on the DR site, like ingress configuration.

 

NetApp® Trident™ protect provides application-aware data protection, mobility, and disaster recovery for any workload running on any Kubernetes distribution. Trident protect enables administrators to easily protect, back up, migrate, and create working clones of Kubernetes applications, through either its CLI or its Kubernetes-native custom resource definitions (CRDs).

 

Trident protect offers various types of execution hooks—custom scripts that you can configure to run in conjunction with a data protection operation of a managed app. With a post-restore hook, you can for example scale down the number of replicas of a deployment after an application restore or clone. Read on to find out how.

Setup

We use the post-restore-scale hook example to demonstrate how to scale down an NGINX sample application after a restore into a new namespace on the same cluster. The example uses a Azure Kubernetes Service (AKS) cluster to host the NGINX application, but the procedure is valid for all supported K8s clusters by Trident protect – in a cloud or on premises.

Sample application

The below manifest defines our NGINX demo application. It will be deployed in the namespace demo and three NGINX pods will mount the same volume. The PV is backed by NetApp Azure NetApp Files (ANF) which supports the ReadWriteMany access mode.

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: demo
  labels:
    app: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
  namespace: demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: web
    spec:
      containers:
      - image: nginx:latest
        name: nginx
        resources: {}
        volumeMounts:
        - mountPath: /data
          name: data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: nginxdata
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginxdata
  namespace: demo
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  storageClassName: azure-netapp-files-standard
EOF
namespace/demo created
deployment.apps/web created
persistentvolumeclaim/nginxdata created

After deploying the sample application to our AKS cluster, the pods will come up shortly:

~$ kubectl get all,pvc -n demo 
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-64cdb84b99-6bvcl   1/1     Running   0          16m
pod/web-64cdb84b99-7kw5j   1/1     Running   0          16m
pod/web-64cdb84b99-8whkf   1/1     Running   0          16m
 
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   3/3     3            3           16m
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   3         3         3       16m
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-06e051cf-102f-4f6c-85ea-2f89f24cc1c5   50Gi       RWX            azure-netapp-files-standard   <unset>                 16m

Install Trident protect post-restore hook components

To scale down the NGINX sample application after a restore, we add the post-restore-scale hook from our collection of example execution hooks in the Verda GitHub project and adapt it to our needs. Let’s clone the Verda GitHub repository and change into the Verda/Post-restore-scale directory:

$ git clone git@github.com:NetApp/Verda.git
Cloning into 'Verda'...
Enter passphrase for key '/Users/patricu/.ssh/id_rsa':
remote: Enumerating objects: 317, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 317 (delta 54), reused 30 (delta 30), pack-reused 233 (from 1)
Receiving objects: 100% (317/317), 88.24 KiB | 491.00 KiB/s, done.
Resolving deltas: 100% (151/151), done.

~$ cd Verda/Post-restore-scale/
~/Verda/Post-restore-scale# ls -l
total 24
-rw-r--r--@ 1 patricu  staff  3417 Apr 16 10:47 post-restore-scale.sh
-rw-r--r--@ 1 patricu  staff  1666 Apr 16 10:47 README.md
-rw-r--r--@ 1 patricu  staff  1186 Apr 16 10:47 scale-infra.yaml

First, we need to adapt the manifest for the helper tools to our sample application. We make sure that the namespace values are set to the namespace demo of the sample app and that the labels fit our application needs:

$ cat scale-infra.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tp-hook-deployment
  namespace: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      serviceAccountName: kubectl-ns-admin-sa
      containers:
      - name: alpine-tp-hook
        image: alpine:latest
        env:
          - name: KUBECTL_VERSION
            value: "1.30.1"
        command: ["/bin/sh"]
        args:
        - -c
        - >
          apk add curl jq py3-pip &&
          curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
          mv kubectl /usr/bin/kubectl &&
          chmod +x /usr/bin/kubectl &&
          trap : TERM INT; sleep infinity & wait

With the manifest for the helper pod adapted to our application, we can deploy it into the namespace of the sample application and confirm that the helper pod is running:

$ kubectl apply -f scale-infra.yaml
serviceaccount/kubectl-ns-admin-sa created
rolebinding.rbac.authorization.k8s.io/kubectl-ns-admin-sa created
deployment.apps/tp-hook-deployment created

$ kubectl get all,pvc -n demo
NAME                                     READY   STATUS    RESTARTS   AGE
pod/tp-hook-deployment-b8888bb85-5dq5w   1/1     Running   0          17s
pod/web-64cdb84b99-6bvcl                 1/1     Running   0          41h
pod/web-64cdb84b99-7kw5j                 1/1     Running   0          41h
pod/web-64cdb84b99-8whkf                 1/1     Running   0          41h
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/tp-hook-deployment   1/1     1            1           18s
deployment.apps/web                  3/3     3            3           41h
 
NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/tp-hook-deployment-cb85b84d7   1         1         1       18s
replicaset.apps/web-64cdb84b99                 3         3         3       41h
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-06e051cf-102f-4f6c-85ea-2f89f24cc1c5   50Gi       RWX            azure-netapp-files-standard   <unset>                 41h

Manage sample application in Trident protect

The AKS cluster on which we deployed the demo application has Trident protect already installed and an AppVault custom resource configured. We can start managing our demo application directly by defining its namespace demo as an application name demo in Trident protect using its CLI:

 

$ tridentctl-protect create application demo --namespaces demo -n demo
Application "demo" created.
 
$ tridentctl-protect get application -A
+-----------+------+------------+-------+-----+
| NAMESPACE | NAME | NAMESPACES | STATE | AGE |
+-----------+------+------------+-------+-----+
| demo      | demo | demo       | Ready | 5s  |
+-----------+------+------------+-------+-----+

Add post-restore execution hook

Now we can create the post-restore execution hook for our application with the Trident protect CLI by providing the following information:

1. Hook name:

  • We chose a unique name for the hook - demo-post-restore-scale

2. Action

  • Restore

3. Stage

  • Post

4.  Application

  • Our app’s name demo

5. Source file

  • Path of the execution hook script

6. Hook arguments (mandatory for this specific hook):

  • Key-value pairs specifying the desired number of replicas for every deployment you want to change.
    • web=1 in our example, as we want to scale down the application to one NGINX pod after restoring.
  • You can specify as many valid key-value pairs as needed and the order does not matter. Invalid entries will lead to a failure of the hook (no rescaling), but the restore will succeed.

7. Hook filter (defines the container in which the hook script will be executed):

  • Hook filter type: We select containerName from the from the available options of containerImage, containerName, podName, podLabel, and namespaceName.

With these details, the CLI command to create the execution hook in the demo namespace is:

$ tridentctl-protect create exechook demo-post-restore-scale --action Restore --stage Post --app demo --source-file ./post-restore-scale.sh --arg web=1 --match containerName:alpine-tp-hook -n demo
ExecHook "demo-post-restore-scale" created.

We verify that the hook was create successfully and is in the ENABLED state:

$ tridentctl-protect get exechooks -n demo
+-------------------------+------+------------------------------+---------+-------+---------+-------+-----+
|          NAME           | APP  |            MATCH             | ACTION  | STAGE | ENABLED | ERROR | AGE |
+-------------------------+------+------------------------------+---------+-------+---------+-------+-----+
| demo-post-restore-scale | demo | containerName:alpine-tp-hook | Restore | Post  | true    |       | 5s  |
+-------------------------+------+------------------------------+---------+-------+---------+-------+-----+

 

Test application rescale after cloning

Now we’re ready to run a test and see if the post-restore hook scales down the application after a restore operation.

Let’s first create a snapshot of the sample application from which we can restore in the second step:

$ tridentctl-protect create snapshot --appvault demo --app demo -n demo
Snapshot "demo-g8du8c" created.

$ tridentctl-protect get snapshot -n demo
+-------------+------+----------------+---------+-------+-----+
|    NAME     | APP  | RECLAIM POLICY |  STATE  | ERROR | AGE |
+-------------+------+----------------+---------+-------+-----+
| demo-g8du8c | demo | Delete         | Running |       | 12s |
+-------------+------+----------------+---------+-------+-----+

$ tridentctl-protect get snapshot -n demo
+-------------+------+----------------+-----------+-------+-------+
|    NAME     | APP  | RECLAIM POLICY |   STATE   | ERROR |  AGE  |
+-------------+------+----------------+-----------+-------+-------+
| demo-g8du8c | demo | Delete         | Completed |       | 2m20s |
+-------------+------+----------------+-----------+-------+-------+

Once the snapshot operation completed, we restore from the snapshot demo-g8du8c into a new namespace demo-clone with the following command:

$ tridentctl-protect create snapshotrestore --snapshot demo/demo-g8du8c --namespace-mapping demo:demo-clone -n demo-clone
SnapshotRestore "demo-wmidex" created.

The restore will finish in some minutes and using the kubectl command, we can confirm that the cloned application demo-clone is now running with one replica only and the post-restore-scale hook worked as expected:

~$ kubectl get all,pvc -n demo-clone
NAME                                     READY   STATUS    RESTARTS   AGE
pod/tp-hook-deployment-b8888bb85-zw6sj   1/1     Running   0          90s
pod/web-64cdb84b99-4kj8w                 1/1     Running   0          90s
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/tp-hook-deployment   1/1     1            1           90s
deployment.apps/web                  1/1     1            1           90s
 
NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/tp-hook-deployment-b8888bb85   1         1         1       90s
replicaset.apps/web-64cdb84b99                 1         1         1       90s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-5501d440-db6e-4a59-a763-4d7a33bc0a68   50Gi       RWX            azure-netapp-files-standard   <unset>                 96s

The details of the web deployment show that the replica set was scaled down from 3 to 1, and we also see the annotation original-replicas: 3 added by the post-restore-scale hook with the number of replicas of the original application:

~$ kubectl -n demo-clone describe deployment.apps/web
Name:                   web
Namespace:              demo-clone
CreationTimestamp:      Fri, 16 Apr 2025 11:55:23 +0000
Labels:                 app=web
Annotations:            deployment.kubernetes.io/revision: 1
                        original-replicas: 3
Selector:               app=web
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=web
  Containers:
   nginx:
    Image:        nginx:latest
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /data from data (rw)
  Volumes:
   data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nginxdata
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   web-679bd7c944 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  5m1s  deployment-controller  Scaled down replica set web-64cdb84b99 to 1 from 3

The details of the execution hook run are kept in the details of the snapshotrestore run, if there was an error we could find it there:

$ kubectl -n demo-clone get snapshotrestore demo-wmidex -o yaml | yq '.status.postRestoreExecHooksRunResults'
- completionTimestamp: "2025-04-16T11:56:15Z"
  containerImage: alpine:latest
  containerName: alpine-tp-hook
  execHookRef: demo-post-restore-scale
  execHookUID: c9e4f175-d90e-41d0-b5ee-7f058504961f
  jobName: ehr-fd180042ba56bd01635e7c5a424de110
  namespace: demo-clone
  podName: tp-hook-deployment-b8888bb85-zw6sj
  podUID: 4e382034-da09-45e8-9cbe-d56bcb307ce3
  startTimestamp: "2025-04-16T11:56:08Z"

Conclusion

In certain scenarios, it’s crucial to change K8s application definitions after a restore or clone operation. With its execution hooks framework, Trident protect offers custom actions that can be configured to run in conjunction with a data protection operation of a managed app.

 

Trident protect supports the following types of execution hooks, based on when they can be run:

  • Pre-snapshot
  • Post-snapshot
  • Pre-backup
  • Post-backup
  • Post-restore
  • Post-failover

The Verda GitHub project contains a collection of example execution hooks for various applications and scenarios.

 

In this blog post we showed how to leverage Trident protect’s execution hooks framework to downscale an application after a live clone operation utilizing the sample post-restore-scale hook in Verda. The hook script can rescale an arbitrary number of deployments after a restore or clone operation and stores the original number of replicas in an annotation in the respective deployment.

Public