The NetApp DataOps Toolkit (DOTK) continues to evolve as a self-service data management framework for modern data and AI infrastructure. DOTK has become an enabler for Data Engineering and AI platform teams to easily orchestrate reproducible and automated data management workflows.
With the release of NetApp DataOps Toolkit v2.7, NetApp introduces two strategic enhancements that expand its capabilities:
- Automated FlexCache provisioning for ONTAP and Kubernetes environments
- Native support for Google Cloud NetApp Volumes (GCNV)
Together, these features strengthen DOTK as a unified self-service framework for globally distributed AI workloads, large-scale experimentation, and hybrid cloud DataOps pipelines. This blog explores the technical depth of these enhancements.
Understanding FlexCache Provisioning using DOTK 2.7
Historically, configuring FlexCache volumes required manual administrative steps in ONTAP. With v2.7, DOTK fully automates FlexCache provisioning through simple CLI commands and Python APIs. A FlexCache volume acts as a high-performance, low-latency cache of a remote ONTAP volume, making it ideal for read-intensive workloads such as AI training, analytics, and large-scale data preprocessing.
Example Usage of FlexCache Operation with DOTK-Traditional:
The NetApp DataOps Toolkit for Traditional Environments can be used to create a FlexCache volume in ONTAP. The command for creating a new FlexCache volume is:
netapp_dataops_cli.py create flexcache
To create a FlexCache volume on your ONTAP system named 'cache2' in SVM 'svm0' for the source volume 'source2' in the source SVM 'svm1', with a size of 500GB, run the following command:
netapp_dataops_cli.py create flexcache --flexcache-vol=cache2 --flexcache-svm=svm0 --source-volume=source2 --source-svm=svm1 --flexcache-size=500GB
Output:
Creating FlexCache: svm1:source2 -> svm0:cache2
FlexCache created successfully.
Example Usage of Flexcache Operation with DOTK-Kubernetes :
The NetApp DataOps Toolkit for Kubernetes can be used to create a FlexCache volume in ONTAP and then create a PV (PersistentVolume) and PVC (PersistentVolumeClaim) representing the FlexCache in Kubernetes. The command to create a FlexCache volume is:
netapp_dataops_k8s_cli.py create flexcache
DOTK for Kubernetes will automate the process of creating the FlexCache, creating the PV, creating the PVC, and binding the PVC in Kubernetes.
To create a PVC named ‘test-cache-vol’, using the Trident backend ‘ontap’, that is bound to a FlexCache for the source volume ‘test-vol1’ in the source SVM ‘svm0’, with a size of 53GB, run the following command:
netapp_dataops_k8s_cli.py create flexcache --flexcache-vol=test-cache-vol1 --source-vol=test-vol1 --source-svm=svm0 --flexcache-size=53Gi --backend-name=ontap --namespace=trident --trident-namespace=netapp-trident
Output:
Creating FlexCache: svm0:test-vol1 -> svm0:test-cache-vol1
FlexCache created successfully.
[K8s] Creating PV 'pv-test-cache-vol1' in namespace 'trident'...
[K8s] PV 'pv-test-cache-vol1' created successfully.
[K8s] Creating PVC 'test-cache-vol1' in namespace 'trident'...
[K8s] PVC 'test-cache-vol1' created successfully.
Waiting for Kubernetes to bind volume to PVC.
[K8s] PVC 'test-cache-vol1' is bound to PV 'pv-test-cache-vol1'.
Volume successfully created and bound to PersistentVolumeClaim (PVC) 'test-cache-vol1' in namespace 'trident'.
The NetApp DataOps Toolkit can also be used to delete a FlexCache volume in ONTAP and remove its associated Kubernetes PVC and PV. The command to delete a FlexCache volume is:
netapp_dataops_k8s_cli.py delete flexcache-volume
Automated FlexCache provisioning introduces:
- Faster data access for distributed AI training jobs
- Improved scalability for global teams and multi-region workloads
- Consistent cache management without manual ONTAP intervention
This marks a major leap in simplifying data localization for high-performance AI and analytics workflows.
Understanding Native Support for Google Cloud NetApp Volumes (GCNV)
The second major enhancement is the support for Google Cloud NetApp Volumes (GCNV), a fully managed service offered by Google Cloud. With DOTK 2.7, users can now create, clone, snapshot, and manage volumes in GCNV, as well as automate dataset versioning and replication workflows.
The command to install the DataOps Toolkit with Google Cloud NetApp integration (which installs google-cloud-netapp) is:
python3 -m pip install 'netapp-dataops-traditional[gcp]'
Note: The [gcp] extra is required for GCNV functionality and will install google-cloud-netapp.
Example Usage:
To create a new NetApp volume with NFS protocol support, execute the following python code:
from netapp_dataops.traditional import gcnv
# Create a 100 GiB NFS volume
response = gcnv.create_volume(
project_id="my-gcp-project",
location="us-central1",
volume_id="demo-volume",
share_name="demo-share",
storage_pool="my-storage-pool",
capacity_gib=100,
protocols=["NFSV3"],
export_policy_rules=[{
"allowed_clients": "10.0.0.0/24",
"access_type": "READ_WRITE",
"has_root_access": True,
"nfsv3": True,
"nfsv4": False
}],
description="Demo volume for testing"
)
Output:
{
"status": "success",
"details": "projects/my-gcp-project/locations/us-central1/volumes/demo-volume"
}
You can also perform other volume management operations like cloning, deleting, creating snapshots, deleting snapshots, and even cross-region replication for disaster recovery and high availability.
What This Unlocks Technically:
- Hybrid and Multi-Cloud ML Pipelines: Train in GCP while cloning or caching data from on-prem ONTAP.
- Dataset Reproducibility: Snapshots ensure consistent training data across experiments.
- High-Throughput, Low-Latency Storage: Ideal for AI training pipelines.
GCNV support significantly extends the cloud-native footprint of DOTK, making it a true multi-cloud DataOps automation tool.
Conclusion:
NetApp DataOps Toolkit v2.7 marks a strategic leap forward in simplifying scalable, automated, and cloud-ready data management operations. With powerful enhancements like FlexCache automation and native GCNV support, DOTK continues to empower data engineers, developers, and AI platform teams to accelerate innovation while maintaining compliance, performance, and operational ease across hybrid and multi-cloud environments.