Today’s world is driven by high-tech infrastructure, and organizations are pushing the limits on extracting the maximum returns from their technology investments. The core principles of modern IT have evolved around automated resource allocation, collaborative platforms, scalability, cost-effectiveness, centralized access, and seamless user management. Organizations are constantly looking for platforms and technologies that can meet these needs and support current and next-generation workloads.
A perfect convergence to all these needs is , which has emerged as a vital tool for fostering collaboration and enhancing productivity across data-driven teams. JupyterHub is a scalable multiuser platform that enables organizations to deploy and manage Jupyter Notebooks for various kinds of users. Users can work in isolated environments while securely and efficiently accessing shared computational resources.
In the world of artificial intelligence and machine learning (AI/ML), JupyterHub has become a cornerstone for data scientists, analysts, researchers, and AI practitioners by providing a collaborative and interactive environment for exploring data, building models, and sharing insights—all enabled by the foundational workflows of sharing notebooks, code, and results in a controlled environment.
Additionally, JupyterHub’s ability to integrate with enterprise authentication systems and cloud services ensures that it aligns with organizational security policies and scalability needs. By centralizing notebook management and streamlining workflows, JupyterHub enhances data-driven decision-making and accelerates innovation, making it an indispensable component of contemporary enterprise IT infrastructure.
Simply put, JupyterHub and Jupyter Notebooks have made their impact across a wide spectrum of user personas in the modern IT garage. However, in working with JupyterHub, one of the core challenges is effectively presenting and managing data across different users in a shared environment. Whether it’s loading data for ML experiments, analyzing data, or presenting results, the key to enabling collaboration and efficiency is the way data is ingested, shared, and visualized within JupyterHub.
This is where Google Cloud NetApp Volumes steps in!
Google Cloud NetApp Volumes is a fully managed cloud file storage solution that allows users to easily host and manage their data on an enterprise-grade, high-performance storage system with support for NFS and SMB protocols.
To uncover this combination of JupyterHub and NetApp Volumes, this blog discusses highlights, integration points, implementation, and typical day-to-day workflows.
Why Google Cloud NetApp Volumes for JupyterHub?
Google Cloud NetApp Volumes is a powerful, scalable, and flexible file storage service offering a range of features that make it well suited as a data plane for JupyterHub.
Persistent storage. Keep Jupyter Notebooks and data available, even when the user’s server pods are restarted. Google Cloud NetApp Volumes maintains data integrity and automatic reconnection with the new pods during restarts, which is crucial for continuity and preserving the state of the projects that are running on JupyterHub.
Scalability and performance. Scale effortlessly to accommodate growing data needs and demanding workloads. The high-performance storage capabilities of NetApp Volumes make data access for Jupyter Notebooks smooth and efficient, even in dealing with large datasets. Users can base storage provisioning for their notebooks on the performance requirements by choosing the appropriate service levels.
Integration with Kubernetes. Integrate seamlessly with Kubernetes through the NetApp ® , simplifying the process of provisioning and managing persistent volumes for the JupyterHub deployment. Persistent volume claims (PVCs) are automatically created when a Jupyter Notebook is spun up and a persistent volume is provisioned through the Trident CSI driver and bound to the PVC. Each user is allocated a dedicated amount of data storage for their private use.
Security and data protection. Use robust security features, including encryption and access control, to protect sensitive data. With support for NetApp Snapshot ™ copies and backups, recover data quickly in case of accidental deletion or system failures. By leveraging , you can protect Jupyter Notebooks and their persistent data by using Snapshot copies and backups, and you can support business-critical notebooks with a business continuity and disaster recovery plan.
Cost-effectiveness. NetApp Volumes is a cost-effective solution for storing data in the cloud, reducing the need for additional storage infrastructure and minimizing overall costs. For large-scale datasets, optimize the storage footprint by auto-tiering unused data to lower-cost storage while continuing to maintain the same file system view to the client.
Implementing Google Cloud NetApp Volumes with JupyterHub
Setting up a JupyterHub environment with Google Cloud NetApp Volumes is a three-step process:
Configure a Kubernetes cluster with the Trident CSI and Google Cloud NetApp Volumes as a back end. For detailed instructions, refer to the NetApp Trident with Google Cloud NetApp Volumes blog.
Create a storage class in the Kubernetes cluster by using Trident CSI as the provisioner.
Deploy JupyterHub on the previously created Kubernetes cluster by following the Zero to JupyterHub with Kubernetes approach.
As part of the deployment, update the config.yaml file for JupyterHub to point to the storage class that was created earlier.
Here on, all storage needs for JupyterHub will be serviced by Google Cloud NetApp Volumes.
A typical implementation for a team of data scientists
Consider this scenario: Arvind, Steve, and Junior are working on an AI project. Although they each have their own user space, they need to collaborate over a common dataset for their data science operations.
Each of them is assigned a personal dedicated storage space of 10GiB by default to their workspaces.
A PVC for a 500GiB shared storage space that caters to AI workloads is serviced by NetApp Volumes through a storage class gcnv-nfs-perf-sc that maps to a high-performance storage tier. This volume will host the dataset that the team will use for their AI/ML operations.
To present this high-performance shared storage to the user spaces, the configuration of JupyterHub is updated as follows using the config.yaml file.
For these changes to be effective, upgrade the JupyterHub deployment by using helm upgrade:
helm upgrade <helm-release-name> jupyterhub/jupyterhub --version=<chart-version> -n <namespace-name> --values config.yaml
The shared volume is then available within all the user spaces, facilitating seamless collaboration.
Conclusion
The integration of Google Cloud NetApp Volumes with JupyterHub offers a powerful and flexible solution for managing data in cloud-based application development, data science, and ML workflows. By combining NetApp Volumes’ robust, scalable storage capabilities with JupyterHub’s collaborative, multiuser environment, teams can efficiently access, store, and work on large datasets while maintaining seamless integration with cloud-native tools. This integration enhances performance, data accessibility, and collaboration, and it allows developers, data scientists, and researchers to focus on their work rather than managing infrastructure. As businesses continue to adopt cloud-first strategies, this integration provides a scalable, reliable, and cost-effective solution for cutting-edge computing and storage needs.
... View more
Electronic Design Automation (EDA) is challenged with exponential growth and shared storage is a critical component when running EDA compute jobs. To make your EDA jobs go faster, Google Cloud NetApp Volumes introduced the Large Volumes feature. It offers the capacity and performance scalability characteristics which modern EDA design processes require.
... View more
This post introduces the Amazon CloudWatch dashboardーan automated Amazon CloudWatch dashboard that simplifies monitoring FSx for ONTAP file systems. This pre-configured dashboard integrates critical insights for FSx for ONTAP directly into a single view, making it easier to track performance and detect issues in real time, all from within the AWS console.
... View more
Authorize access to your data and encrypt your in-transit NFS data using your own encryption - Kerberos in Google Cloud NetApp Volumes! This article describes Kerberos for NFS and provides information on how to set it up in 10 steps so you can validate Kerberos for your high security needs.
... View more
With the 24.10 release of the NetApp ® Trident ™ storage provisioner, we are now supporting Fibre Channel Protocol (FCP) in the ontap-san driver (sanType: fcp). This support enables customers to leverage their existing infrastructure investments in FC for modern workloads like running containers or virtual machines on Kubernetes/OpenShift Virtualization. FC is a technology preview feature of Trident, enabling customers to test the new functionality in nonproduction environments.
For a full list of new features in Trident 24.10, read the announcement blog post.
In this blog, we’ll briefly show you how to configure Trident with an FC back end and demonstrate volume snapshot and resize operations.
Prerequisites
This blog post assumes the following:
You have a Kubernetes cluster with the latest Trident installed, and its associated kubeconfig.
Zoning is done on your FC switches and your system that’s running NetApp ONTAP ® software, following the ONTAP documentation.
Trident is installed on the Kubernetes cluster, and the cluster nodes are prepared according to the Trident documentation.
You have access to a workstation that has kubectl configured to use the kubeconfig and that has tridentctl CLI installed.
Trident configuration
We start by configuring the ONTAP back end with SAN drivers using this back-end configuration in JSON format. We provide the access details and the storage driver name ontap-san (which is common across iSCSI, NVMe, and FCP), and we set sanType to fcp.
$ cat 1_fcp_backend.json
{
"version": 1,
"storageDriverName": "ontap-san",
"managementLIF": "172.16.100.98",
"svm": "svm1",
"username": "admin",
"password": "Ab0xB@wks!",
"sanType": "fcp",
"useREST": false,
"backendName": "LXRRRxxbfr",
"instanceName": "LXRRRxxbfr"
}
This tridentctl command creates the back end LXRRRxxbfr:
$ tridentctl create backend -f 1_fcp_backend.json -n trident
+------------+----------------+--------------------------------------+--------+------------+---------+
| NAME | STORAGE DRIVER | UUID | STATE | USER-STATE | VOLUMES |
+------------+----------------+--------------------------------------+--------+------------+---------+
| LXRRRxxbfr | ontap-san | dd2110ac-d412-4c93-9a24-85a86b0c80f5 | online | normal. | 0 |
+------------+----------------+--------------------------------------+--------+------------+---------+
With the back end in place and online, we create a corresponding storage class basic-fcp to dynamically provision persistent volumes later.
$ cat 2_storage-class-basic.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: basic-fcp
provisioner: csi.trident.netapp.io
allowVolumeExpansion: true
parameters:
backendType: "ontap-san"
fsType: "ext4"
$ kubectl create -f ./2_storage-class-basic.yaml
storageclass.storage.k8s.io/basic-fcp created
We also create a snapshot class csi-snapclass for later use in snapshot creation.
$ cat 2a_snapshot-class-basic.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapclass
driver: csi.trident.netapp.io
deletionPolicy: Delete
$ kubectl create -f ./ 2a_snapshot-class-basic.yaml
Volumesnapshotclass.snapshot.storage.k8s.io/csi-snapclass created
Checking on the ONTAP console, we see that our worker nodes (sti-rx2540-266) worldwide port name (WWPN) is not yet registered in ONTAP initiator groups (igroups), because we have not created any workload pods yet.
$ hostname
sti-rx2540-266.ctl.gdl.englab.netapp.com
stiA300-2911726562639::> igroup show -vserver svml
Vserver Igroup Protocol OS Type Initiators
--------- ------------ -------- ------- --------------------------------------------
svm1 sti-c210-347 fcp linux 21:00:00:24:ff:27:de:1a
21:00:00:24:ff:27:de:1b
svm1 sti-rx2540-263
fcp linux 10:00:00:10:9b:1d:73:7c
10:00:00:10:9b:1d:73:7d
svm1 sti-rx2540-263.ctl.gdl.englab.netapp.com-fcp-c38dbd10-f2a9-4762-b553-f
871fef4f7a7
fcp linux 10:00:00:10:9b:1d:73:7c
10:00:00:10:9b:1d:73:7d
21:00:00:24:ff:48:fb:14
21:00:00:24:ff:48:fb:15
svm1 trident mixed linux 18:00:00:10:9b:dd:dd:dd
4 entries were displayed
On the worker node, let’s check the multipath output to confirm that there’s only one device mapper (DM) device before we create persistent volumes and pods on the worker node:
$ multipath -ll
3600а098038303048783f4a7148556f2d dm-0 NETAPP, LUN C-Mode
size=40G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alwp=rw
|-+- policy='service-time 0' prio=10 status=enabled
| |- 10:0:0:0 sda 8:0 active ready running
| '- 12:0:0:0 sdc 8:32 active ready running
'-+- policy='service-time 0' prio=50 status=active
|- 10:0:1:0 sdb 8:16 active ready running
'- 12:0:1:0 sdd 8:48 active ready running
Test volume creation and snapshot operations
Now let’s create a workload pod and a PersistentVolumeClaim (PVC) by applying this manifest:
$ cat 3_sts_with_pvc.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-statefulset
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: basic-fcp-pvc
mountPath: /mnt/basic-san
volumeClaimTemplates:
- metadata:
name: basic-fcp-pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "basic-fcp"
resources:
requests:
storage: 20Mi
$ kubectl create -f ./ 3_sts_with_pvc.yaml
Statefulset.apps/my-statefulset created
We confirm that the PVC was created:
$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
Default basic-fcp-pvc-my-statefulset-0 Bound pvc-fed5f2fa-9841-4313-a619-1548e9457bce 20Mi RWO basic-fop 4s
And after a few seconds, the pod is up and running:
$ kubectl get po -A | grep statefulset
NAMESPACE NAME READY STATUS RESTARTS AGE
Default my-statefulset-0 1/1 Running 0 16s
After the pod is up, we can see that there’s an additional DM device dm-5 on our worker node:
$ multipath -ll
3600а098038313768583f58394343506a dm-5 NETAPP, LUN C-Mode
size=20M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 al ua' wp=rw
'-+- policy='service-time 0' prio=50 status=active
|- 11:0:0:0 sde 8:64 active ready running
'- 13:0:0:0 sdf 8:80 active ready running
3600а098038303048783f4a7148556f2d dm-0 NETAPP, LUN C-Mode
size=40G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alwp=rw
|-+- policy='service-time 0' prio=10 status=enabled
| |- 10:0:0:0 sda 8:0 active ready running
| '- 12:0:0:0 sdc 8:32 active ready running
'-+- policy='service-time 0' prio=50 status=active
|- 10:0:1:0 sdb 8:16 active ready running
'- 12:0:1:0 sdd 8:48 active ready running
In the ONTAP console, we also see that the igroup now has a new entry, with an igroup name that is prefixed by the worker node’s host name and its WWPN. If there were a ReadWriteMany (RWX) volume attached to a second worker node B, that worker node’s igroup would be different. That’s called “per-node igroup”, in line with iSCSI specs.
stiA300-2911726562639::> igroup show -vserver svml
Vserver Igroup Protocol OS Type Initiators
--------- ------------ -------- ------- --------------------------------------------
svm1 sti-c210-347 fcp linux 21:00:00:24:ff:27:de:1a
21:00:00:24:ff:27:de:1b
svm1 sti-rx2540-263
fcp linux 10:00:00:10:9b:1d:73:7c
10:00:00:10:9b:1d:73:7d
svm1 sti-rx2540-263.ctl.gdl.englab.netapp.com-fcp-c38dbd10-f2a9-4762-b553-f
871fef4f7a7
fcp linux 10:00:00:10:9b:1d:73:7c
10:00:00:10:9b:1d:73:7d
21:00:00:24:ff:48:fb:14
21:00:00:24:ff:48:fb:15
svm1 sti-rx2540-266.ctl.gdl.englab.netapp.com-fcp-8017638f-8e81-419b-8202-c
c16678b394f
fcp linux 0:00:00:90:fa:cd:fd:c0
0:00:00:90:fa:cd:fd:c1
21:00:00:24:ff:30:2:a2
21:00:00:24:ff:30:2:a3
svm1 trident mixed linux 18:00:00:10:9b:dd:dd:dd
‹space> to page down, <return> for next line, or 'g' to quit...
Now we will create a snapshot and try to clone a PVC from it.
To create the volume snapshot ontap-fcp-snapshot of the PVC created earlier, we use this manifest:
$ cat 4_fcp_ontap-snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: ontap-fcp-snapshot
spec:
volumeSnapshotClassName: csi-snapclass
source:
persistentVolumeClaimName: basic-fcp-pvc-my-statefulset-0
$ kubectl create -f 4_fcp_ontap-snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/ontap-fcp-snapshot created
The snapshot is ready after a few seconds:
$ kubectl get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT
CREATIONTIME AGE
ontap-fcp-snapshot true basic-fcp-pvc-my-statefulset-0 20Mi csi-snapclass snapcontent-69f30df6-353d-444b-8a8c-fe4fe3693206 4s 4s
And we’re ready to create a clone PVC ontap-fcp-pvc-from-snapshot out of that snapshot by using this manifest:
$ cat 5_fcp_ontap-pvc-from-snapshot.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ontap-fcp-pvc-from-snapshot
spec:
accessModes:
- ReadWriteOnce
storageClassName: basic-fcp
resources:
requests:
storage: 20Mi
dataSource:
name: ontap-fcp-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
$ kubectl create -f 5_fcp_ontap-pvc-from-snapshot.yaml
persistenvolumeclaim/ontap-fcp-pvc-from-snapshot created
Checking the list of PVCs, the second PVC in the following list is the PVC from the snapshot, so our cloned PVC was created successfully.
$ kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGE CLASS AGE
default basic-fcp-pvc-my-statefulset-0 Bound pvc-fed5f2fa-9841-4313-a619-1548e9457bce 20Mi RWO basic-fcp 73s
default ontap-fcp-pvc-from-snapshot Bound pvc-c3bf4ffc-7b11-4ef7-8334-895b971d61db 20Mi RWO basic-fcp 6s
Test volume resize operation
As a last test, we use the following manifest to create another FCP-backed PVC basic-fcp-pvc of size 1GiB for going through a resizing workflow:
$ cat pvc-basic-1.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: basic-fcp-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: basic-fcp
$ kubectl create -f ./ pvc-basic-1.yaml
persistenvolumeclaim/basic-fcp-pvc created
We attach a pod/deployment to the newly created PVC:
$ cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:latest
name: nginx
volumeMounts:
- mountPath: /usr/share/nginx/html
name: nginx-data
volumes:
- name: nginx-data
persistentVolumeClaim:
claimName: basic-fcp-pvc
$ kubectl create -f ./deployment.yaml
deployment.apps/test-deployment created
And we wait for the pod to come up:
$ kubectl get po -A | grep test
NAMESPACE NAME READY STATUS RESTARTS AGE
default test-deployment-6bc4596cbc-zqv12 1/1 Running 0 14s
Now let’s patch the PVC basic-fcp-pvc to a size of 2GiB from the initial 1GiB:
$ kubectl patch pvc basic-fcp-pvc -p ‘{“spec”: {“resources”: {“requests”: {“storage”: :2Gi”}}}}’
persistenvolumeclaim/basic-fcp-pvc patched
For the resize operation to become effective, we need to restart the pod, so let’s delete it:
$ kubectl delete po test-deployment-6bc4596cbc-zqv12
pod “test-deployment-6bc4596cbc-zqv12” deleted
When the new pod comes up, the resize operation has happened.
$ kubectl get po -A | grep test
default test-deployment-6bc4596cbc-cps8q 1/1 Running 0 5s
Finally, we confirm the successful resizing by querying the PVCs and see that its capacity is now 2GiB:
$ kubectl get pvc -A | grep basic-fcp-pvc
default basic-fcp-pvc Bound pvc-20414113-7630-4dcb-b522-e89853ac77f3 2Gi RWO basic-fcp 57s
default basic-fcp-pvc-my-statefulset-0 Bound pvc-fed5f2fa-9841-4313-a619-1548e9457bce 20Mi RWO basic-fcp 2m24ss
default ontap-fcp-pvc-from-snapshot Bound pvc-c3bf4ffc-7b11-4ef7-8334-895b971d61db 20Mi RWO basic-fcp 77s
Conclusion
In summary, we configured an FC back end with Trident on a Kubernetes cluster. We then created multiple persistent volumes on the FC back end and showed that storage operations like creating snapshots, cloning, and resizing work as expected.
... View more