Tech ONTAP Blogs

How to rescale Kubernetes applications with Astra Control post-restore hooks

PatricU
NetApp
2,325 Views

By Michael Haigh (@MichaelHaigh) and Patric Uebele, Technical Marketing Engineers at NetApp

Introduction

Cloning a Kubernetes application for testing purposes or restoring it for disaster recovery may require scaling down (or up) the number of replicas to accommodate the available resources or performance requirements. This can be the case for both on-premises and cloud-based Kubernetes deployments. Therefore, it’s essential for the data management and backup system used to protect the Kubernetes applications to be able to modify Kubernetes configurations after a restore or clone operation. That’s also important for other aspects that might need to be changed on the disaster recovery site, like ingress configuration.

 

NetApp® Astra™ Control provides application-aware data protection, mobility, and disaster recovery for any workload running on any K8s distribution. It’s available both as a fully managed service (Astra Control Service; ACS) and as self-managed software (Astra Control Center; ACC). Astra Control enables administrators to easily protect, back up, migrate, and create working clones of K8s applications, through either its UI or robust APIs.

 

Astra Control offers various types of execution hooks—custom scripts that you can configure to run in conjunction with a data protection operation of a managed app. With a post-restore hook, you can for example scale down the number of replicas of a deployment after an application restore or clone. Read on to find out how.

Setup

We use the post-restore-scale hook example to demonstrate how to scale down an NGINX sample application after a restore into a new namespace on the same cluster. The example uses a Google Kubernetes Engine (GKE) cluster to host the NGINX application, but the procedure is valid for all K8s clusters supported by Astra Control, in a cloud or on premises.

Sample application

The following manifest defines our NGINX demo application. It will be deployed in the namespace demo, and three NGINX pods will mount the same volume. The persistent volume is backed by NetApp Cloud Volumes Service, which supports the ReadWriteMany access mode.

 

~# cat nginx-gke-rwx-cvs.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: demo
  labels:
    app: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
  namespace: demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: web
    spec:
      containers:
      - image: nginx:latest
        name: nginx
        resources: {}
        volumeMounts:
        - mountPath: /data
          name: data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: nginxdata
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginxdata
  namespace: demo
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  storageClassName: netapp-cvs-perf-standard

 

We deploy the sample application to our GKE cluster and wait for the pods to come up:

 

~# kubectl apply -f nginx-gke-rwx-cvs.yaml
namespace/demo created
deployment.apps/web created
persistentvolumeclaim/nginxdata created
~# kubectl get all,pvc -o wide -n demo
NAME                                         READY   STATUS    RESTARTS   AGE     IP           NODE                                           NOMINATED NODE   READINESS GATES
pod/web-679bd7c944-2vb9q                     1/1     Running   0          3m20s   10.60.0.8    gke-pu-gke-test-1-default-pool-624d6e94-tscw   <none>           <none>
pod/web-679bd7c944-6tlcb                     1/1     Running   0          3m20s   10.60.1.7    gke-pu-gke-test-1-default-pool-624d6e94-x159   <none>           <none>
pod/web-679bd7c944-jxfzd                     1/1     Running   0          3m20s   10.60.2.14   gke-pu-gke-test-1-default-pool-624d6e94-d911   <none>           <none>

NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS          IMAGES          SELECTOR
deployment.apps/web                     3/3     3            3           3m22s   nginx               nginx:latest    app=web

NAME                                               DESIRED   CURRENT   READY   AGE     CONTAINERS          IMAGES          SELECTOR
replicaset.apps/web-679bd7c944                     3         3         3       3m22s   nginx               nginx:latest    app=web,pod-template-hash=679bd7c944

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE     VOLUMEMODE
persistentvolumeclaim/nginxdata   Bound    pvc-d39039e6-bd6c-4616-b42f-94cbe45f26b3   100Gi      RWX            netapp-cvs-perf-standard   3m22s   Filesystem

 

Install Astra post-restore hook components

To scale down the NGINX sample application after a restore, we add the post-restore-scale hook from our collection of example execution hooks in the Verda GitHub project and adapt it to our needs. First, we clone the Verda GitHub repository and change into the Verda/Post-restore-scale directory:

 

~# git clone git@github.com:NetApp/Verda.git
Cloning into 'Verda'...
Enter passphrase for key '/root/.ssh/id_rsa':
remote: Enumerating objects: 284, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (108/108), done.
remote: Total 284 (delta 64), reused 67 (delta 25), pack-reused 151
Receiving objects: 100% (284/284), 78.43 KiB | 319.00 KiB/s, done.
Resolving deltas: 100% (130/130), done.
~# cd Verda/Post-restore-scale/
~/Verda/Post-restore-scale# ls -l
total 12
-rw-r--r-- 1 root root 1666 Aug  8 15:48 README.md
-rw-r--r-- 1 root root 3417 Aug  8 15:48 post-restore-scale.sh
-rw-r--r-- 1 root root 1186 Aug  8 15:48 scale-infra.yaml

 

Next, we need to adapt the manifest for the helper tools to our sample application. We make sure that the namespace values are set to the namespace demo of the sample app and that the labels fit our application needs:

 

~# cat scale-infra.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubectl-ns-admin-sa
  namespace: demo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: kubectl-ns-admin-sa
  namespace: demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: astra-hook-deployment
  namespace: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      serviceAccountName: kubectl-ns-admin-sa
      containers:
      - name: alpine-astra-hook
        image: alpine:latest
        env:
          - name: KUBECTL_VERSION
            value: "1.23.9"
        command: ["/bin/sh"]
        args:
        - -c
        - >
          apk add curl jq py3-pip &&
          curl -sLO https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl &&
          mv kubectl /usr/bin/kubectl &&
          chmod +x /usr/bin/kubectl &&
          trap : TERM INT; sleep infinity & wait

 

With the manifest for the helper pod adapted to our application, we can deploy it into the namespace of the sample application and confirm that the helper pod is running:

 

~# kubectl apply -f scale-infra.yaml
serviceaccount/kubectl-ns-admin-sa created
rolebinding.rbac.authorization.k8s.io/kubectl-ns-admin-sa created
deployment.apps/astra-hook-deployment created

~# kubectl get all,pvc -n demo
NAME                                         READY   STATUS    RESTARTS   AGE
pod/astra-hook-deployment-758ffd85fc-bw6kk   1/1     Running   0          23s
pod/web-679bd7c944-2vb9q                     1/1     Running   0          3m20s
pod/web-679bd7c944-6tlcb                     1/1     Running   0          3m20s
pod/web-679bd7c944-jxfzd                     1/1     Running   0          3m20s

NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/astra-hook-deployment   1/1     1            1           24s
deployment.apps/web                     3/3     3            3           3m22s

NAME                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/astra-hook-deployment-758ffd85fc   1         1         1       24s
replicaset.apps/web-679bd7c944                     3         3         3       3m22s

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
persistentvolumeclaim/nginxdata   Bound    pvc-d39039e6-bd6c-4616-b42f-94cbe45f26b3   100Gi      RWX            netapp-cvs-perf-standard   3m21s

 

Manage sample application in Astra Control

The GKE cluster on which we deployed the demo application is already managed by Astra Control Service, so the sample application was automatically discovered by ACS, and we can manage it directly by defining its namespace as an application in ACS:

PatricU_1-1692199559791.png

Add post-restore execution hook

First, we upload the post-restore-scale hook script to the library of execution hooks in our Astra Control account. In Account > Scripts, click Add:

Figure 2: Adding the post-restore-scale hook script to the ACS account.Figure 2: Adding the post-restore-scale hook script to the ACS account.

Then we upload the post-restore-scale.sh script from the cloned Verda repository on our laptop to Astra Control and name the script accordingly:

Figure 3: Uploading the post-restore-scale hook script to ACS from the cloned Verda GitHub repository.Figure 3: Uploading the post-restore-scale hook script to ACS from the cloned Verda GitHub repository.

With the post-restore-scale hook script now available in the hooks library of our ACS account, we can add it as a post-restore hook to the sample application. In the Execution Hooks tab in the Application Details view of the demo application, click Add:

Figure 4: Adding the execution hook to the sample application.Figure 4: Adding the execution hook to the sample application.

In the next screen, we configure the post-restore hook with the necessary details:

  1. Operation:
    • Select Post-restore from the drop-down list.
  2. Hook arguments (mandatory for this specific hook):
    • Key-value pairs specifying the desired number of replicas for every deployment you want to change.
    • web=1 in our example, because we want to scale down the application to one NGINX pod after restoring.
    • You can specify as many valid key-value pairs as needed, and the order does not matter. Invalid entries will lead to a failure of the hook (no rescaling), but the restore will succeed.
  3. Hook name:
    • Enter a unique name for the hook.
  4. Hook filter (defines the container in which the hook script will be executed):
    • Hook filter type: Select Container Name from the dropdown list.
    • Enter alpine-astra-hook as a regular expression in Regular Expression 2 (RE2) syntax.

Figure 5: Specifying hook details.Figure 5: Specifying hook details.

In the next screen we select the post-restore-scale.sh script from the available hook scripts in our ACS account:

Figure 6: Selecting the hook script.Figure 6: Selecting the hook script.

In the final review screen, we check the hook parameters and add the hook to the sample application:

Figure 7: Final review of the hook parameters.Figure 7: Final review of the hook parameters.

To verify that the execution hook will run in the correct container, alpine-astra-hook, we check that the container image matches in the Details view of the demo-post-restore-scale execution hook:

Figure 8: Confirming that the hook is executed in the right container.Figure 8: Confirming that the hook is executed in the right container.

Test application rescale after live cloning

Now we’re ready to run a test to see if the post-restore hook scales down the application. In the Astra Control UI, from the Applications view, we start a live clone operation for the demo application:

Figure 9: Starting live clone of demo application.Figure 9: Starting live clone of demo application.

We enter demo-clone as the new name and namespace for the new application that Astra Control will create and then select the destination cluster:

Figure 10: Live clone details.Figure 10: Live clone details.

In the Summary step, we confirm that the post-restore-scale hook will be executed after the clone operation and then start the clone process:

Figure 11: Summary of the clone process.Figure 11: Summary of the clone process.

Astra Control now makes a NetApp Snapshot™ copy of the sample application’s data and metadata and copies them into the destination namespace demo-clone. As a final step, the post-restore hook is executed and scales down the NGINX deployment. The steps are logged in the Astra Control Activity log:

Figure 12: Activity log entries for the clone operation.Figure 12: Activity log entries for the clone operation.

Using the kubectl command, we confirm that the cloned application demo-clone is now running with one replica only and that the post-restore-scale hook worked as expected:

 

~# kubectl get all,pvc -n demo-clone -o wide
NAME                                         READY   STATUS    RESTARTS   AGE     IP           NODE                                           NOMINATED NODE   READINESS GATES
pod/astra-hook-deployment-758ffd85fc-5mm9j   1/1     Running   0          3m46s   10.60.0.12   gke-pu-gke-test-1-default-pool-624d6e94-tscw   <none>           <none>
pod/web-679bd7c944-ldj6l                     1/1     Running   0          3m47s   10.60.1.9    gke-pu-gke-test-1-default-pool-624d6e94-x159   <none>           <none>

NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS          IMAGES          SELECTOR
deployment.apps/astra-hook-deployment   1/1     1            1           3m47s   alpine-astra-hook   alpine:latest   app=demo
deployment.apps/web                     1/1     1            1           3m48s   nginx               nginx:latest    app=web

NAME                                               DESIRED   CURRENT   READY   AGE     CONTAINERS          IMAGES          SELECTOR
replicaset.apps/astra-hook-deployment-758ffd85fc   1         1         1       3m47s   alpine-astra-hook   alpine:latest   app=demo,pod-template-hash=758ffd85fc
replicaset.apps/web-679bd7c944                     1         1         1       3m48s   nginx               nginx:latest    app=web,pod-template-hash=679bd7c944

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE     VOLUMEMODE
persistentvolumeclaim/nginxdata   Bound    pvc-987e8255-3b1c-41bc-85cd-7278061b78c1   100Gi      RWX            netapp-cvs-perf-standard   3m49s   Filesystem

 

The details of the web deployment show that the replica set was scaled down from 3 to 1, and we also see the annotation original-replicas: 3 added by the post-restore-scale hook with the number of replicas of the original application:

 

~# kubectl -n demo-clone describe deployment.apps/web
Name:                   web
Namespace:              demo-clone
CreationTimestamp:      Thu, 10 Aug 2023 11:48:23 +0000
Labels:                 app=web
                        app.netapp.io/managed-by=astra.netapp.io
Annotations:            deployment.kubernetes.io/revision: 1
                        original-replicas: 3
Selector:               app=web
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=web
           app.netapp.io/managed-by=astra.netapp.io
  Containers:
   nginx:
    Image:        nginx:latest
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /data from data (rw)
  Volumes:
   data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nginxdata
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   web-679bd7c944 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  5m1s  deployment-controller  Scaled down replica set web-679bd7c944 to 1 from 3

 

Conclusion

In certain scenarios, it’s crucial to change K8s application definitions after a restore or clone operation. With its execution hooks framework, Astra Control offers custom actions that can be configured to run in conjunction with a data protection operation of a managed app.

 

Astra Control supports the following types of execution hooks, based on when they can be run:

  • Pre Snapshot
  • Post Snapshot
  • Pre Backup
  • Post Backup
  • Post Restore
  • Post Failover

The Verda GitHub project contains a collection of example execution hooks for various applications and scenarios.

 

In this blog post we showed how to leverage the Astra Control execution hooks framework to downscale an application after a live clone operation by using the sample post-restore-scale hook in Verda. The hook script can rescale an arbitrary number of deployments after a restore or clone operation and stores the original number of replicas in an annotation in the respective deployment.

Take advantage of NetApp’s continuing innovation

To see for yourself how easy it is to protect persistent Kubernetes applications with Astra Control, by using either its UI or the powerful Astra Toolkit, apply for a free trial. Get started today!

Public