Enhancing Data Consistency: Volume Group Snapshots in OpenShift Virtualization with Trident

banusundhar · ‎2025-08-19

By now, you're probably familiar with Trident, the powerful tool that enables seamless consumption and management of storage resources across all popular NetApp storage platforms. Whether you're working in the public cloud or on-premises, Trident has you covered. It supports a wide range of NetApp storage solutions, including ONTAP clusters (AFF, FAS, and ASA), ONTAP Select, Cloud Volumes ONTAP, Element software (NetApp HCI, SolidFire), Azure NetApp Files, Amazon FSx for NetApp ONTAP, and Google Cloud NetApp Volumes.

Trident is a fully supported, open-source project maintained by NetApp, designed to meet the persistence demands of your containerized applications using industry-standard interfaces like the Container Storage Interface (CSI).

CSI is a standard that allows storage vendors to provide drivers to expose block and file storage systems to containerized workloads on Kubernetes and Kubernetes based platforms. It provides the necessary standards to implement various storage features that Kubernetes leverages to manage persistent storage for containerized workloads.

Trident, being CSI-compliant, integrates natively with Kubernetes and supports dynamic storage orchestration. You might have already used the volume snapshot feature in Kubernetes, which allows you to create point-in-time snapshots of single volumes. This feature relies on the CSI driver to implement specific snapshot capabilities, and Trident leverages the efficient NetApp Snapshot technology.

Did you know that Kubernetes introduced the Volume Group Snapshot as an alpha feature in v1.27 and promoted it to Beta in v1.32? Why is this feature important, and how does Trident support it? Let's explore!

Imagine workloads that span multiple persistent volumes, such as databases where data is stored in one volume and logs in another. To achieve application-consistent snapshots, you typically must quiesce the application, take individual snapshots of each volume, and then unquiesce the application. However, this approach has some drawbacks:

Sometimes, it may not be possible to quiesce the application.
Quiescing might be too expensive to perform frequently.
You might prefer to quiesce the application for weekly backups but not for nightly backups.
Taking individual snapshots can take longer than taking a snapshot of all volumes simultaneously.
Restoring an application from individually taken snapshots may not be application-consistent.

Clearly, this method has its limitations. So, what's the solution?

Kubernetes addresses this issue with the Volume Group Snapshot feature, designed for modern container workloads and VMs running as containers. This feature allows you to create crash-consistent snapshots of multiple PersistentVolumeClaims (PVCs) simultaneously. To support this, Kubernetes introduces three new API objects:

VolumeGroupSnapshotClass: Created by cluster administrators to describe how volume group snapshots should be created.
VolumeGroupSnapshot: Requested by Kubernetes users to create a volume group snapshot for multiple PVCs.
VolumeGroupSnapshotContent: Created by the snapshot controller for dynamically created VolumeGroupSnapshots.

Trident 25.06 automatically detects new CRDs (specified for the Volume Group Snapshot feature) to enable the relevant feature gates in the Trident CSI sidecars. So, how do you use these CRDs to create the required Trident objects for Volume Group Snapshots? Let's dive in!

Here is the outline of steps I followed to demonstrate the use of VolumeGroupSnapshotclass to create snapshots of a group of volumes and then restore from the snapshots.

Install OpenShift Cluster version: 4.19
- OpenShift version 4.19 uses Kubernetes v1.32 where the Volume Group Snapshot feature is elevated to Beta Status.
- Turn the FeatureGate on for VolumeGroupSnapshots.
Install Trident version 25.06.1 or later.
- Turn the Feature Gate on for VolumeGroupSnapshots.
- Use node-prep for installing iSCSI tools.
Create a Trident iscsi backend, iscsi storage Class and Volume Snapshot class.
- The Volume Group Snapshot feature is supported by Trident only for iscsi protocol currently.
Make the Storage class and the Volume Snapshot class defaults in the cluster.
Create the VolumeGroupSnapshotClass
Install OpenShift Virtualization Operator.
- This needs to be done after step 4 so that the golden images are made available as VolumeSnapshots in the cluster using Trident CSI.
Create a VM with 3 disks. (root disk and 2 additional disks)
- Add some data into each of the disks
Label the PVC(s).
Create VolumeGroupSnapshots
- Use a label selector to match the labels set on the PVCs.
Check the Backend ONTAP system
- Verify that snapshots created for all 3 pvcs.
Restore the disks of the VM

Let me elaborate on each of the above steps:

1. Here is my OpenShift Cluster v4.19 installed with Kubernetes v1.32

# oc get nodes
NAME      STATUS   ROLES                  AGE   VERSION
master1   Ready    control-plane,master   17d   v1.32.6
master2   Ready    control-plane,master   17d   v1.32.6
master3   Ready    control-plane,master   17d   v1.32.6
worker1   Ready    worker                 17d   v1.32.6
worker2   Ready    worker                 17d   v1.32.6
worker3   Ready    worker                 17d   v1.32.6

I turned the FeatureGate on for VolumeGroupSnapshot by doing the following:

Use the OpenShift web console by navigating to Administration -> Custom Resource Definitions, then searching for and clicking on "FeatureGate". From there, click the "Instances" tab and select the "cluster" instance.

Edit the YAML: Within the YAML tab, edit the FeatureGate/cluster object to include VolumeGroupSnapshot in the enabled list under customNoUpgrade.

apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
spec:
  featureSet: CustomNoUpgrade
  customNoUpgrade:
    enabled:
      - VolumeGroupSnapshot

2. Install Trident 25.06.1 using node-prep for installing iSCSI tools.

# tridentctl install -n trident --node-prep iscsi

Ensure that the required CRDs for VolumeGroupSnapshots are installed

[root@localhost trident-installer]# kubectl get crd | grep "volumegroupsnapshot"

volumegroupsnapshotclasses.groupsnapshot.storage.k8s.io           2025-08-12T16:12:46Z

volumegroupsnapshotcontents.groupsnapshot.storage.k8s.io          2025-08-12T16:12:46Z

volumegroupsnapshots.groupsnapshot.storage.k8s.io                 2025-08-12T16:12:46Z

3. Create the backend and the storage classes for iSCSI protocol and create a VolumeSnapshotClass Object with the following yaml definitions.

# cat tbc-iscsi.yaml
apiVersion: v1
kind: Secret
metadata:
  name: tbc-iscsi-secret
type: Opaque
stringData:
  username: admin
  password: <passwd to log into ONTAP ClI>
---
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: tbc-iscsi
spec:
  version: 1
  storageDriverName: ontap-san
  managementLIF: <mgmt-lif>
  backendName: tbc-iscsi
  svm: openshift
  storagePrefix: openshift-iscsi
  defaults:
    formatOptions: "-E nodiscard"
    nameTemplate: "{{ .config.StoragePrefix }}_{{ .volume.Namespace }}_{{ .volume.RequestName }}"
  credentials:
    name: tbc-iscsi-secret

# cat sc-iscsi.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sc-iscsi
provisioner: csi.trident.netapp.io
parameters:
  backendType: "ontap-san"
  provisioningType: "thin"
  fsType: ext4
  snapshots: "true"
reclaimPolicy: "Delete"
allowVolumeExpansion: true

#cat snapshotclass.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: trident-snapshotclass
driver: csi.trident.netapp.io
deletionPolicy: Retain

[root@localhost volumeGroups]# oc get sc
NAME                 PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
sc-iscsi (default)   csi.trident.netapp.io    Delete          Immediate              true                   2d22h
thin-csi             csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   18d

[root@localhost volumeGroups]# oc get volumesnapshotclass
NAME                    DRIVER                   DELETIONPOLICY   AGE
csi-vsphere-vsc         csi.vsphere.vmware.com   Delete           18d
trident-snapshotclass   csi.trident.netapp.io    Delete           2d22h

4. Set the storage class and the volumesnapshotclass as defaults.

kubectl patch storageclass <storage-class-name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

kubectl patch volumesnapshotclass <volumesnapshotclass-name> --type=merge -p '{"metadata":{"annotations":{"snapshot.storage.kubernetes.io/is-default-class":"true"}}}'

Note: It is best practice to have both trident storage class and snapshot class set as default so that you take advantage of fast FlexCloning mechanism from snapshots of the golden image when creating new VMs.

5. Create VolumeGroupSnapshot class

# cat volumegroupsnapshotclass.yaml
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshotClass
metadata:
  name: trident-groupsnapshotclass
  annotations:
    kubernetes.io/description: "Trident group snapshot class"
driver: csi.trident.netapp.io
deletionPolicy: Delete

# oc get volumegroupsnapshotclass
NAME                         DRIVER                  DELETIONPOLICY   AGE
trident-groupsnapshotclass   csi.trident.netapp.io   Delete           2d22h

6. After installing OpenShift Virtualization, you can verify that the golden images are available in volume snapshots.

Screenshot 2025-08-19 at 11.52.57 AM.png

7. Now create a VM using the default template. Create 2 additional disks using the default storage class for this VM.

# oc get vm
NAME         AGE   STATUS    READY
fedora-vm1   62s   Running   True

# oc get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   
dv-fedora-vm1-disk1-ulsgg2   Bound    pvc-6cfe08d6-6910-44ed-b671-1d23e9cf04d1   10Gi       RWX            sc-iscsi       <unset>                 
dv-fedora-vm1-disk2-86oq76   Bound    pvc-619bb1b5-6e1a-4193-ab04-4e16361fe699   20Gi       RWX            sc-iscsi       <unset>                 
fedora-vm1                   Bound    pvc-5e278dc3-79b3-47be-85c1-1b84acb151ec   30Gi       RWX            sc-iscsi       <unset>

You can check the corresponding volumes in ONTAP backend.

The root disk volume is a flex-clone volume of the snapshot with the golden image.

The other 2 volumes for the 2 disks of the VMs are FlexVol volumes.

I logged into the VM using virtctl tool and formatted and mounted the 2 disks as shown below:

fedora-vm1 login: fedora
Password:
[fedora@fedora-vm1 ~]$  sudo mkfs.ext4 /dev/vdc
[fedora@fedora-vm1 ~]$  sudo mkfs.ext4 /dev/vdd
[fedora@fedora-vm1 ~]$  sudo mount /dev/vdc /mnt/data1
[fedora@fedora-vm1 ~]$  sudo mount /dev/vdd /mnt/data2

[fedora@fedora-vm1 ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
zram0  251:0    0  1.8G  0 disk [SWAP]
vda    253:0    0   30G  0 disk
├─vda1 253:1    0    2M  0 part
├─vda2 253:2    0  100M  0 part /boot/efi
├─vda3 253:3    0 1000M  0 part /boot
└─vda4 253:4    0 28.9G  0 part /var
                                /home
                                /
vdb    253:16   0    1M  0 disk
vdc    253:32   0   10G  0 disk /mnt/data1
vdd    253:48   0   20G  0 disk /mnt/data2

I have created a file called sample.txt in each of the disks.
[fedora@fedora-vm1 data1]$ pwd
/mnt/data1
[fedora@fedora-vm1 data1]$ ls
lost+found  sample.txt

[fedora@fedora-vm1 data2]$ pwd
/mnt/data2
[fedora@fedora-vm1 data2]$ ls
lost+found  sample.txt
[fedora@fedora-vm1 data2]$

8. Now label each pvc of the VM using the same key/value.

#oc label pvc fedora-vm1 consistencygroup=group1
persistentvolumeclaim/fedora-vm1 labeled
# oc label pvc dv-fedora-vm1-disk1-ulsgg2 consistencygroup=group1
persistentvolumeclaim/dv-fedora-vm1-disk1-ulsgg2 labeled
# oc label pvc dv-fedora-vm1-disk2-86oq76 consistencygroup=group1
persistentvolumeclaim/dv-fedora-vm1-disk2-86oq76 labeled

Check the labels of the PVCs

[root@localhost volumeGroups]# oc get pvc fedora-vm1 -o jsonpath='{.metadata.labels.consistencygroup'}
group1
[root@localhost volumeGroups]# oc get pvc dv-fedora-vm1-disk1-ulsgg2 -o jsonpath='{.metadata.labels.consistencygroup'}
group1
[root@localhost volumeGroups]# oc get pvc dv-fedora-vm1-disk2-86oq76 -o jsonpath='{.metadata.labels.consistencygroup'}
group1

9. Now let us create a volumeGroupSnapshot using the following yaml.

# cat vgs.yaml
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshot
metadata:
  name: vgs1
spec:
  volumeGroupSnapshotClassName: trident-groupsnapshotclass
  source:
    selector:
      matchLabels:
        consistencygroup: group1 

# oc create -f vgs1.yaml
volumegroupsnapshot.groupsnapshot.storage.k8s.io/vgs1 created

# oc get vgs/vgs1
NAME   READYTOUSE   VOLUMEGROUPSNAPSHOTCLASS     VOLUMEGROUPSNAPSHOTCONTENT                              CREATIONTIME   AGE
vgs1   true         trident-groupsnapshotclass   groupsnapcontent-82e42f0f-d421-4743-bbaf-f56ee56241d1   2m9s           2m26s

# oc get volumesnapshots
NAME                                                                        READYTOUSE   SOURCEPVC                       	RESTORESIZE   
snapshot-4d47c9f45423bfca625a0f1b6c5a5ec456ac59d3e583157be919bb7237317c65   true         dv-fedora-vm1-disk1-ulsgg2                   10Gi                          
snapshot-61c1aada41e28c4fd68327ad10b5561657ed4c7d391d1547569a47204f5f92b9   true         fedora-vm1                                   30Gi                          
snapshot-afb4c4833460e233c4e86f1108c921b86a6f4d0eb182e99e579081ff6f743f56   true         dv-fedora-vm1-disk2-86oq76                   20Gi

A snapshot of all the pvcs with the label key/value pair consistencygroup: group1 will be created. Trident VolumeGroupSnapshots uses ONTAP consistency group in the ONTAP backend.

Note: Trident VolumeGroupSnapshots uses ONTAP consistency group(CG) in the ONTAP backend. If you use REST API, a CG is created with all the volumes belonging to the VM (as grouped by the labels), a snapshot of each volume is taken in a consistent way, and then the CG is deleted. You may or may not be able to see the consistency Group being created and deleted in ONTAP, depending on the timing. Here, I have captured the consistency group created and then deleted in ONTAP.

HCG-NetApp-C400-E9U9::> consistency-group show -vserver openshift
  (vserver consistency-group show)
                         Parent
           Consistency   Consistency
Vserver    Group         Group         State   Size       Available  Used
---------- ------------- ------------- ------- ---------- ---------- ---------
openshift  cg-snapshot-82e42f0f-d421-4743
                         -             online        22GB    21.99GB    7.46MB

HCG-NetApp-C400-E9U9::> consistency-group show -vserver openshift
  (vserver consistency-group show)
There are no entries matching your query.

RESTORING the PVCs of the VM from individual snapshots using Trident

Now let us assume that we have lost the sample.txt file from each of the 2 data disks.

[fedora@fedora-vm1 data1]$ pwd
/mnt/data1
[fedora@fedora-vm1 data1]$ ls
lost+found

[fedora@fedora-vm1 data2]$ pwd
/mnt/data2
[fedora@fedora-vm1 data2]$ ls
lost+found

Note: Although we created a snapshot of a group of volumes as a single unit, we can only restore from individual snapshot.

We all know how to restore a volume from its snapshot using ONTAP CLI or the System Manager. But can we do a volume restoration from its snapshot using Trident? Yes, of course and let us see how.

Trident provides rapid, in-place volume restoration from a snapshot using the TridentActionSnapshotRestore (TASR) CR. This CR functions as an imperative Kubernetes action and does not persist after the operation completes.

First, stop the VM.

Now let's restore the content of the first disk/PVC with its corresponding snapshot using the yaml as shown below:

# cat tasr1.yaml
apiVersion: trident.netapp.io/v1
kind: TridentActionSnapshotRestore
metadata:
  name: trident-snap-disk1
  namespace: default
spec:
  pvcName: dv-fedora-vm1-disk1-ulsgg2
  volumeSnapshotName: snapshot-4d47c9f45423bfca625a0f1b6c5a5ec456ac59d3e583157be919bb7237317c65

# oc create -f tasr1.yaml
tridentactionsnapshotrestore.trident.netapp.io/trident-snap created

Similarly, create another TASR object for the second disk using the PVC and its corresponding snapshot.

# cat tasr2.yaml
apiVersion: trident.netapp.io/v1
kind: TridentActionSnapshotRestore
metadata:
  name: trident-snap-disk2
  namespace: default
spec:
  pvcName: dv-fedora-vm1-disk2-86oq76
  volumeSnapshotName: snapshot-afb4c4833460e233c4e86f1108c921b86a6f4d0eb182e99e579081ff6f743f56

# oc create -f tasr2.yaml
Let us make sure that the restore operation is showing succeeded state.
[root@localhost volumeGroups]# oc get tasr
NAME                 NAMESPACE   PVC                          SNAPSHOT                                                                    STATE       
trident-snap-disk1   default     dv-fedora-vm1-disk1-ulsgg2   snapshot-4d47c9f45423bfca625a0f1b6c5a5ec456ac59d3e583157be919bb7237317c65   Succeeded   
trident-snap-disk2   default     dv-fedora-vm1-disk2-86oq76   snapshot-afb4c4833460e233c4e86f1108c921b86a6f4d0eb182e99e579081ff6f743f56   Succeeded

Now let us start the VM, login to it and ensure that the sample.txt file is back on the disks.

[fedora@fedora-vm1 ~]$ ls /mnt/data1
lost+found  sample.txt
[fedora@fedora-vm1 ~]$ ls /mnt/data2
lost+found  sample.txt
[fedora@fedora-vm1 ~]$

Conclusion

In this blog, I have demonstrated how to create a volume group snapshot of all the PVCs of a VM in OpenShift Virtualization as a single unit and how to restore each snapshot individually using the TridentActionSnapShotRestore CR. This powerful feature ensures that your application data remains consistent and easily manageable, even across multiple volumes.

If you're interested in learning how to use the Volume Group Snapshot feature for container workloads on vanilla Kubernetes, I highly recommend checking out the lab scenario written by Yves Weisser. It's an excellent resource for understanding the application of this feature in different environments.

For more detailed information and comprehensive guides, please visit the Trident documentation page on working with volume group snapshots.