Tech ONTAP Blogs
Tech ONTAP Blogs
Picture this: Your application is humming along, serving millions of users with ease, scaling up as demand spikes, and never missing a beat. That’s the dream Kubernetes delivers. Kubernetes—or K8s, as the cool kids call it—is the unsung hero orchestrating containerized applications at scale. It’s like the conductor of a symphony, ensuring that every note (or container) plays in harmony. But as Kubernetes has evolved from a geeky experiment to the engine of modern computing, its storage demands have surged. Enter NetApp® Trident™, the Container Storage Interface (CSI) driver that bridges Kubernetes with NetApp ONTAP® storage, so that your apps get the persistent, scalable storage they need to thrive.
But as workloads surged, we hit a snag—a bottleneck that threatened to slow the music down. This blog takes you behind the scenes of how we tackled this challenge for the Node part of the Trident (for the controller: Trident Controller Parallelism), the clever fix we engineered, and the jaw-dropping results that followed, turning Trident into a lean, mean, scalability machine ready for the Kubernetes revolution.
Kubernetes isn’t just growing—it’s booming. The stats tell the story. The Cloud Native Computing Foundation (CNCF) reports that Kubernetes adoption has surged, with 96% of enterprises using the platform in 2024, and 80% deploying it in production environments. A 2025 study from Mordor Intelligence shows that the market for Kubernetes is experiencing robust financial growth, projected to expand from US$2.57 billion in 2025 to US$7.07 billion by 2030, at a compound annual growth rate (CAGR) of 22.4%. This expansion is significantly driven by the increasing demand for managed services and the rapid rise of artificial intelligence and machine learning workloads, which increasing rely on Kubernetes as their foundational infrastructure
Why? Because it’s fast, flexible, and lets you scale like a boss.
But here’s the rub: As these workloads pile in, they need dynamic volume provisioning—storage that can scale up fast and flawlessly. Trident was built for this provisioning, but early on, it relied on a single global lock mechanism to manage requests.
Picture this: A busy airport with just one check-in counter for all flights. Passengers (requests) stack up, waiting their turn, even if they’re on different airlines or headed to different destinations. That’s what the Trident Node single global lock was like. Every provisioning request—attaching, mounting, unmounting, formatting—had to queue up, even if they were for unrelated volumes. That kept things orderly when Kubernetes workloads were light, but as the crowds grew, it turned into a scalability challenge. In a cluster with 500 pods, each needing a volume, a 1-second-per-request delay meant 500 seconds (over 8 minutes) of waiting. In today’s world of microsecond SLAs, that’s an eternity.
Serializing everything sacrificed speed for safety. It was like forcing all planes to take off one at a time, no matter how many runways were free. We needed a smarter way to manage the traffic.
Before we charge ahead with this epic Kubernetes storage adventure, let’s hit the brakes for just a second. I know I’m interrupting the flow, but trust me, this little detour will be worth it. The Node component of the CSI driver operates in accordance with a defined specification that governs its behavior, particularly through four primary gRPC calls: NodeStage, NodeUnstage, NodePublish, and NodeUnpublish. (These gRPC calls are WRTnode Pods.. These calls are crucial to the management of storage resources when a pod with a Persistent Volume Claim (PVC) is deployed.
To elaborate, here’s the sequence of operations on the Node side:
These slick moves are powered by gRPC calls, a fancy way of saying that the CSI driver and Kubernetes Nodes chat back and forth like a well-rehearsed crew, making sure that every step is perfectly timed. When you deploy a Pod with a PVC, it’s like the director yelling “Action!” The CSI driver leaps in, running this choreography to get the volume ready, hooked up, and eventually tidied away. Now that you’ve got the behind-the-scenes magic, let’s jump back into the main plot.
We weren’t about to let a bottleneck stop us. Drawing inspiration from concurrency gurus like Katherine Cox-Buday (Concurrency in Go: Tips and Techniques for Developers) and Rob Pike’s Go concurrency talks, we engineered a two-step solution that’s as elegant as it is practical.
Out went the single lock; in came a per-volume lock based on each volume’s unique UUID. Think of it as giving a check-in counter to each airline and also segregating further depending on the destination. Requests for the same volume wait their turn (to avoid chaos), but requests for different volumes? They race ahead in parallel.
Check out the code:
lockContext := "NodeStageVolume"
defer locks.Unlock(ctx, lockContext, req.GetVolumeId())
if !attemptLock(ctx, lockContext, req.GetVolumeId(), csiNodeLockTimeout) {
return nil, status.Error(codes.Aborted, "request waited too long for the lock")
}
These locks are placed at the start of the gRPC calls that we just discussed.
But freedom comes with a catch. If the kubelet (Kubernetes’ Node agent) flooded Trident with hundreds of requests at once—like formatting tons of volumes—it could clog the system. Imagine opening every airport gate at once, only to jam the security check-ins. Enter the limiter, our smart traffic cop. It caps how many requests can run simultaneously, tailored to the protocol (iSCSI, NAS) and operation (NodeStage, NodePublish, etc.). Check it out:
if err := p.limiterSharedMap[NodeStageNFSVolume].Wait(ctx); err != nil {
return nil, err
}
defer p.limiterSharedMap[NodeStageNFSVolume].Release(ctx)
The maximum number of allowable requests is a configurable parameter, but currently we do not permit users to modify it. If there's an increased need for it, then it can be implemented with a simple one- or two-line code change.
Currently, this parallelization has been enabled only for the NAS and SAN (iSCSI) protocols. In the upcoming Trident 25.10 release, it will be extended to support the remaining SAN protocols, FCP and NVMe.
Here are the limits that Trident is currently using:
maxNodeStageNFSVolumeOperations = 10
maxNodeStageSMBVolumeOperations = 10
maxNodeUnstageNFSVolumeOperations = 10
maxNodeUnstageSMBVolumeOperations = 10
maxNodePublishNFSVolumeOperations = 10
maxNodePublishSMBVolumeOperations = 10
maxNodeUnpublishVolumeOperations = 10
maxNodeStageISCSIVolumeOperations = 5
maxNodeUnstageISCSIVolumeOperations = 10
maxNodePublishISCSIVolumeOperations = 10
maxNodeExpandVolumeOperations = 10
So, did our fix work? You bet. Although exact metrics depend on your setup, here’s a taste of what we’ve seen.
These are the configurations under which these tests were recorded:
number_of_k8_nodes = Single
volume_count = 30
luks = Enabled
fs_type = ext4
formatting_options = None
access_mode = ReadWriteOnce
And here are some numbers with the added controller concurrency (for the controller: Trident Controller Parallelism), which was recently introduced as a tech preview in the Trident 25.06 release.
These figures also include the improvements observed from adding a few of the luks arguments to the luksFormat command. If you look closely, you'll notice that adding controller parallelism doesn't deliver the speedup you might expect. That’s because attaching a volume to a Pod hinges more on the Node rather than the controller, with tasks like formatting and LUKS encryption accounting for a significant portion of the time.
Here are some numbers excluding LUKS formatting from our calculations, which more clearly highlights the performance gains we achieved purely through parallelism.
To use this feature you don't need to do anything specific, just install or upgrade Trident and it's business as usual. Github link: https://github.com/NetApp/trident
Trident is not just keeping up—it’s leading the charge.
Kubernetes is here to stay, and NetApp Trident is ready to rock the house. By ditching the single lock and embracing UUID-based locking and the limiter, we’ve built a storage solution that’s fast, scalable, and simple to maintain. Whether you’re running a handful of containers or thousands, Trident has your back.
Running Kubernetes with NetApp ONTAP? Give these upgrades a spin and tell us what you think—your feedback keeps us sharp!
References: