Tech ONTAP Blogs

Storage class migration with Trident protect backup & restore

PatricU
NetApp
1,387 Views

Introduction

When working with stateful, data-rich applications in Kubernetes, you might run into situations where moving your persistent volumes (PVs) to different storage back ends is required—for example, to achieve better performance or lower cost, or to phase out old storage hardware. When using dynamic provisioning, this involves migrating your PVs to different storage classes. NetApp® Trident™ protect data management capabilities offer you easy and safe means to migrate persistent volumes to a different storage class while minimizing application downtime.

 

NetApp® Trident™ protect provides application-aware data protection, mobility, and disaster recovery for any workload running on any K8s distribution. Trident protect enables administrators to easily protect, back up, migrate, and create working clones of K8s applications, through either its CLI or its Kubernetes-native custom resource definitions (CRDs).

Setup

We use a NGINX application deployed on an Azure Kubernetes Service (AKS) cluster with a persistent volume backed by Azure Disk via the AKS managed-csi storage class.  We want to migrate this PV to Azure NetApp Files storage in the standard performance tier with minimum downtime and effort. The corresponding storage class was already created when we installed and configured Trident on the cluster:

$ kubectl get sc
NAME                                    PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azure-netapp-files-standard (default)   csi.trident.netapp.io   Delete          Immediate              true                   2d
azurefile                               file.csi.azure.com      Delete          Immediate              true                   2d
azurefile-csi                           file.csi.azure.com      Delete          Immediate              true                   2d
azurefile-csi-premium                   file.csi.azure.com      Delete          Immediate              true                   2d
azurefile-premium                       file.csi.azure.com      Delete          Immediate              true                   2d
default                                 disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   2d
managed                                 disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   2d
managed-csi                             disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   2d
managed-csi-premium                     disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   2d
managed-premium                         disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   2d

Here is the Kubernetes configuration of the NGINX application in the namespace web-ad:

$ kubectl get all,pvc -n web-ad
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-64cdb84b99-sdfff   1/1     Running   0          20h
 
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           22h
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   1         1         1       22h
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-6ec8b2c0-bd05-4b50-b1bb-3ad970855a4d   2Gi        RWO            managed-csi    <unset>                 22h

Let’s add some random data to the application’s persistent volume:

$ for i in {1..5}; do  kubectl -n web-ad exec -it pod/web-64cdb84b99-sdfff -- dd if=/dev/urandom of=/data/file${i} bs=1024k count=10; done
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0498185 s, 210 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0608314 s, 172 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0531475 s, 197 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0577863 s, 181 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0535681 s, 196 MB/s
 
$ kubectl -n web-ad exec -it pod/web-64cdb84b99-sdfff -- ls -l /data
total 51216
-rw-r--r-- 1 root root 10485760 Apr 17 11:40 file1
-rw-r--r-- 1 root root 10485760 Apr 17 11:40 file2
-rw-r--r-- 1 root root 10485760 Apr 17 11:40 file3
-rw-r--r-- 1 root root 10485760 Apr 17 11:40 file4
-rw-r--r-- 1 root root 10485760 Apr 17 11:40 file5
drwx------ 2 root root    16384 Apr 16 13:24 lost+found
 
$ kubectl -n web-ad exec -it pod/web-64cdb84b99-sdfff -- df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc        2.0G   51M  1.9G   3% /data

Trident and Trident protect are already installed and configured on the cluster, so we can create the corresponding Trident protect application web-ad right away:

$ tridentctl-protect create app web-ad --namespaces web-ad -n web-ad
Application "web-ad" created.

~$ tridentctl-protect get application -n web-ad
+--------+------------+-------+-----+
|  NAME  | NAMESPACES | STATE | AGE |
+--------+------------+-------+-----+
| web-ad | web-ad     | Ready | 19s |
+--------+------------+-------+-----+

The next sections walk you through two slightly different scenarios that explain how to migrate the PV of the NGINX application to a different storage class with minimum application downtime.

Option 1: Clone to a new namespace

This option uses backup and restore with Trident protect to clone the NGINX application into a new namespace (web-clone in our example) and into a new storage class. If you don’t need to keep the application in the original namespace, this is the easiest, fastest, and safest way to migrate to a new storage class, because it keeps the original application for an easy failback in the unlikely event that an error occurs.

 

First, we stop application traffic by scaling the web deployment down to zero replicas:

$ kubectl -n web-ad scale deployment.apps/web --replicas=0
deployment.apps/web scaled
 
$ kubectl get all,pvc -n web-ad
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   0/0     0            0           104m
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   0         0         0       104m
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-6ec8b2c0-bd05-4b50-b1bb-3ad970855a4d   2Gi        RWO            managed-csi    <unset>                 104m

Then we create a Trident protect backup, wait for its finalization using the tridentctl-protect wait command option, and immediately restore from the backup into the new namespace web-clone and the target storage class azure-netapp-files-standard with the --storageclass-mapping option of tridentctl-protect:

$ tridentctl-protect create backup web-bkp --appvault demo --app web-ad -n web-ad; tridentctl-protect wait backup web-bkp -n web-ad; tridentctl-protect create backuprestore --backup web-ad/web-bkp --namespace-mapping web-ad:web-clone --storageclass-mapping managed-csi:azure-netapp-files-standard -n web-clone
Backup "web-bkp" created.
Waiting for resource to be in final state: 0s
Resource is in final state: Completed
BackupRestore "web-ad-vrqjv3" created.

The restored app comes up quickly in the target namespace:

$ kubectl get all,pvc -n web-clone
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   0/0     0            0           4m15s
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   0         0         0       4m15s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-03c8efd6-c68f-4927-b035-00e4d0809948   50Gi       RWO            azure-netapp-files-standard   <unset>                 4m17s

The last step after the clone operation is to start NGINX again in the new namespace by scaling up the web deployment:

$ kubectl -n web-clone scale deployment.apps/web --replicas=1
deployment.apps/web scaled
 
$ kubectl get all,pvc -n web-clone
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-64cdb84b99-d2c84   1/1     Running   0          11s
 
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           4m48s
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   1         1         1       4m48s
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-03c8efd6-c68f-4927-b035-00e4d0809948   50Gi       RWO            azure-netapp-files-standard   <unset>                 4m51s

And to confirm that the persistent data have been successfully copied to the azure-netapp-files-standard storage class:

$ kubectl -n web-clone exec -it pod/web-64cdb84b99-d2c84 -- ls -l /data
total 51444
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file1
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file2
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file3
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file4
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file5
drwx------ 2 nobody nogroup     4096 Apr 16 13:24 lost+found
 
$ kubectl -n web-clone exec -it pod/web-64cdb84b99-d2c84 -- df -h /data
Filesystem                                           Size  Used Avail Use% Mounted on
10.21.2.4:/pvc-03c8efd6-c68f-4927-b035-00e4d0809948   50G   51M   50G   1% /data

Option 2: Restore to the same namespace

Use this approach if your workflows require the application to remain in the same K8s namespace after the storage class migration. In this case, we must delete the original namespace before we can restore an application backup into the same namespace and into a different storage class. Again, we show you how to migrate using the CLI of Trident protect.

 

As the deletion of the source namespace will also delete the Trident protect custom resources in the namespace, namely the backup CR (but not the actual backup in the object storage when using the default reclaim policy of Retain for the backups), we need to find (and save) the path of the backup in the object storage archive before deleting the backup. With the appArchivePath value available, we can then restore from the object storage archive without having the backup CR. To make the steps less error prone, we can use this little script:

$ cat backuprestore-scmig.sh
#!/bin/bash
#
APPVAULT=demo
APP=web-ad
APPNS=web-ad
APPSC=managed-csi
CLONE=web-clone
CLONENS=web-clone
CLONESC=azure-netapp-files-standard
BKUPNAME=web-bkp
 
# Create backup
tridentctl-protect create backup ${BKUPNAME} --appvault ${APPVAULT} --app ${APP} --reclaim-policy Retain -n ${APPNS}
 
# Wait for backup to finish
tridentctl-protect wait backup ${BKUPNAME} -n ${APPNS}
 
# Check if backup succeeded
BKUPSTATE=$(kubectl -n ${APPNS} get backup ${BKUPNAME} -o yaml | yq '.status.state')
if [[ $BKUPSTATE != "Completed" ]]
then
  printf "Backup didn't complete successfully, exiting. \n"
  exit 10
fi
 
# Get APPARCHIVEPATH
APPARCHIVEPATH=$(kubectl -n ${APPNS} get backup ${BKUPNAME} -o yaml | yq '.status.appArchivePath')
 
# Delete app namespace
kubectl delete ns ${APPNS}
 
# Run BackupRestore with Storage Class mapping:
tridentctl-protect create backuprestore --appvault ${APPVAULT} --path ${APPARCHIVEPATH} --namespace-mapping ${APPNS}:${APPNS} --storageclass-mapping ${APPSC}:${CLONESC} -n ${APPNS}

In more detail, the script

  1. Creates a Trident protect backup of the application
  2. Waits for the backup to finish
  3. Checks for a successful completion of the backup
  4. Gets the value of the appArchivePath from the backup and stores it in a variable
  5. Deletes the application namespace
  6. Runs a Trident protect restore from the backup in the original namespace (will be created by the restore job) and into the new storage class

Let’s run the script then:

$ sh ./backuprestore-scmig.sh
Backup "web-bkp" created.
Waiting for resource to be in final state: 0s
Resource is in final state: Completed
namespace "web-ad" deleted
BackupRestore "web-ad-b5sr14" created.

Once the restore is complete, we start NGINX in the web-ad namespace by scaling up the web deployment:

$ kubectl -n web-ad scale deployment.apps/web --replicas=1
deployment.apps/web scaled
 
$ kubectl get all,pvc -n web-ad
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-64cdb84b99-6vrhm   1/1     Running   0          21s
 
NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           10m
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/web-64cdb84b99   1         1         1       10m
 
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/nginxdata   Bound    pvc-027a8609-c9de-47b1-aae9-5061b959ccc0   50Gi       RWO            azure-netapp-files-standard   <unset>                 10m

Finally, we check for the successful restore of the persistent data to the new storage class:

$ kubectl -n web-ad exec -it pod/web-64cdb84b99-6vrhm -- ls -l /data
total 51444
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file1
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file2
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file3
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file4
-rw-r--r-- 1 nobody nogroup 10485760 Apr 17 11:40 file5
drwx------ 2 nobody nogroup     4096 Apr 16 13:24 lost+found
 
$ kubectl -n web-ad exec -it pod/web-64cdb84b99-6vrhm -- df -h /data
Filesystem                                           Size  Used Avail Use% Mounted on
10.21.2.4:/pvc-027a8609-c9de-47b1-aae9-5061b959ccc0   50G   51M   50G   1% /data

Conclusion

When you need to migrate data of your data-rich Kubernetes applications between storage classes, the data management capabilities of NetApp Trident protect offer easy and safe means to migrate persistent volumes to a different storage class with minimal application downtime. 

 

In this blog post we demonstrated two different ways of migrating a stateful K8s application to a different storage class with Trident protect, depending on whether your application can be migrated to a different namespace or needs to remain in the same namespace after the storage migration.

Public