NetApp Console delivers HIPAA (Health Insurance Portability and Accountability Act)- compliant data intelligence without storing ePHI
NetApp Console n ...read more
NetApp Console delivers simplicity with Console agent
NetApp® Console agent is the secure and trusted software from NetApp that enables the workflows ...read more
In this third part of our AI/ML storage benchmarking series, we dive into the practical application of the DLIO benchmark to evaluate storage performa ...read more
Introducing self-service partnerships through NetApp Console: Empowering partners to collaborate seamlessly with enterprises!
In today’s hybrid clou ...read more
By Mohammad Hossein Hajkazemi, Bhushan Jain, Arpan Chowdhry
Introduction
Google Cloud NetApp Volumes (GCNV) is a fully managed, cloud-native storage service for Google Cloud that delivers high-performance, enterprise-grade storage with advanced data management features. NetApp’s ONTAP storage operating system serves as a unified solution supporting file and block protocols, while delivering robust enterprise-grade data management capabilities over standard NAS and SAN protocols for GCNV.
In this phase of the solution, we are introducing iSCSI block service under the GCNV Flex service level. Benchmarking, performance analysis, and continuous improvement are critical for cloud storage customers to ensure their workloads are being served with the lowest possible latency and highest possible throughput. This document outlines the core principles of storage system performance analysis, details of the bottlenecks encountered, and describes the optimizations implemented to enhance the performance of the GCNV block service.
Evaluating the Performance of a Storage System
When assessing the performance of a storage system, it’s crucial to understand how it behaves with different types of workloads. Storage systems exhibit varying performance characteristics depending on the size and pattern of data access. To capture these variations succinctly and meaningfully, we rely on a set of standardized microbenchmarks known as the 4-corner microbenchmarks.
The 4-corner microbenchmarks focus on four fundamental types of I/O operations that represent the extremes or "corners" of typical storage workloads:
64KiB Sequential Read
64KiB Sequential Write
8KiB Random Read
8KiB Random Write
These benchmarks test the system’s ability to handle large, contiguous data transfers (sequential) as well as small, scattered data accesses (random), for both reading and writing. Sequential operations involve reading or writing large blocks of data in a continuous stream. This pattern is common in media streaming, backups, and large file transfers. Random operations involve accessing small blocks of data scattered across the storage medium, typical in database queries, virtual machine disk operations, and general-purpose file sharing workloads.
While storage workloads can be infinitely varied, the 4-corner microbenchmarks are sufficient for a high-level performance evaluation because:
They represent the extremes of access patterns: Most real-world workloads fall somewhere between purely sequential and purely random, and between large and small I/O sizes.
They capture key performance trade-offs: For example, a storage system might have excellent sequential throughput but poor random IOPS, or vice versa.
They simplify benchmarking: Instead of running dozens of tests with varying block sizes and access patterns, these four tests provide a manageable yet informative set of metrics.
They enable comparison: Using standardized benchmarks allows for apples-to-apples comparisons across systems and configurations.
Our Measurement Methodology
Evaluating the performance of a storage system requires a methodology that ensures reliable and representative results for real-world workloads. This evaluation focuses on the four fundamental microbenchmarks—64KiB sequential read and write, and 8KiB random read and write—to capture the key performance characteristics of the system.
Benchmarks are generated using sio, a NetApp proprietary tool similar to an industry standard tool fio, which allows precise control over the rate of I/O operations per second (ops/s). The load increases gradually in a controlled manner to observe how the system scales and where it reaches saturation. Multiple load generating client instances are deployed, and each such client creates multiple sio processes concurrently, simulating a distributed workload and enabling evaluation of multi-client performance and contention effects.
During the performance test, statistics are gathered continuously for different ONTAP sub-systems and cloud resources. Plotting ops-latency or MiB/s-latency curves visualize results, which show the relationship between achieved throughput and response time. Analyzing these curves alongside the collected statistics helps identify peak performance points and potential bottlenecks such as CPU saturation, network limitations, disk contention or software limitations.
The measurement philosophy reflects a deep understanding of the storage system’s internal behavior. For sequential reads, the readahead engine pre-populates the cache by reading data ahead of requests. The benchmark ensures that this engine fetches data directly from the disk, measuring true backend throughput rather than benefiting from cached hits. For random reads, each request reads directly from the disk, bypassing cache effects to measure raw random-access performance. For both random and sequential writes, we overwrite existing data to guarantee actual write operations instead of simply appending or writing to empty space, which could skew the results.
Performance Summary
The following table presents a summary of the performance outcomes for the 4-corners methodology outlined previously. It includes results for both sequential and random operations. Sequential workload performance is reported in MiB/s, while random workloads are measured in Ops/s. Latency metrics have been recorded at the GCNV nodes.
Workload
Performance @ 1ms
Performance Bottleneck
64KiB Sequential Read (MiB/s)
4,700
VM MiB/s entitlement limit
64KiB Sequential Write (MiB/s)
1,800
Journal Replication Latency
8KiB Random Read (Ops/s)
160,000
GCNV service limits
8KiB Random Write (Ops/s)
113,000
Journal Replication Latency
Enhancing Performance Throughout the Development Cycle
This section outlines the methods employed to enhance the performance of various micro-benchmarks.
Random Read Performance
NetApp Volumes benefit from an external cache (EC) that delivers superior performance for read workloads. The capacity of this cache is substantially greater than that of the buffer cache, and read operations served through it do not count towards VM disk entitlement limits, thereby allowing significant scalability in read performance. However, it is possible for the workload's working set size to be significantly larger than the cache size, causing some percentage of the requests to be served by the disks. Since most customer datasets sizes will fall somewhere in between, we covered both scenarios; all requests are served by the disk (100% cache miss), and all requests are served by the cache (100% cache hit).
100% Cache Miss
When running a random read benchmark, our performance goal is to fully saturate either storage IOPS entitlements or compute. The initial 8KiB random read measurements showed lower-than-expected performance, as neither the storage nor compute were saturated. Although there were higher disk IOPS available, the maximum observed usage only reached 80% of that limit.
The first round of bottleneck analysis revealed that the thread pool serving disk operations were not sized adequately. While n disks were attached to the ONTAP cluster, only n/2 such threads were spawned. Increasing the thread pool size led to a noticeable improvement in performance, although the impact was somewhat limited. However, we were still not able to fully saturate the IOPs nor compute, indicating that the bottleneck had shifted. This motivated us to continue working on this issue.
Following this, we collaborated with the disk driver development team to address the identified bottleneck. Through our joint efforts, we implemented an optimization by batching acknowledgments (ACKs) from the backend disk before relaying them to the upper-level stack, rather than processing each acknowledgment individually. This method amortizes the overhead of interrupt handling over multiple ACKs. The results with this proposed approach showed that the disk could be fully saturated by ONTAP, resulting in improved performance. As a result of all the measurements, analysis, and improvements, the peak performance was enhanced by approximately 23% compared to the initial results.
100% Cache Hit
Our initial 8KiB random read benchmark measurement, with the optimized external cache (EC) enabled, showed that it achieved strong performance but could be further improved by right sizing the iSCSI LUN handler thread pool size. However, after making a few code changes to improve other workloads, we noticed a significant performance regression. Our analysis indicated that scheduling overhead due to the enablement of some helper threads. These thread pool is meant to enhance sequential read performance when sequential blocks are read from the external cache caused the degradation. To address this, we disabled the mentioned threads while ensuring that sequential read performance is not adversely affected. The peak performance was improved by approximately 25%.
The Figure[1] below shows the performance improvement progression for random reads.
[1] Please refer to the Appendix section for an explanation of how the OPS-latency graphs are generated.
Sequential Read Performance
Although we expect no performance harm when running sequential reads with external cache (using SSD or NVMe devices) we observed a performance degradation. We found that ONTAP uses small block sizes for external cache (EC) reads and writes, which makes large block operations inefficient due to I/O amplification.
The EC-Bypassing feature was introduced to let large read requests skip the external cache and is activated adaptively in response to defined response time thresholds. Since the read-ahead engine already pre-fetches large sequential reads, EC offers no extra benefit. After enabling bypassing, no notable improvement was seen. Reviewing EC-related runtime statistics showed that EC-Bypass was ineffective because EC response times remained in the acceptable range. The actual bottleneck stemmed from saturated interrupt threads processing the completion queue, not EC saturation. Offloading completions to helper threads improved performance and resolved the bottleneck, but this approach couldn't be adopted long-term as it caused performance regressions in other benchmarks.
Finally, we ended up changing the local caching policy so that only randomly written and read data can be cached. Previously, based on the older policy, sequentially read and written data would also be cached. Changing the policy recovered the performance, as shown in the graph below, since almost all sequentially read data were read ahead and prefetched from the backend disk instead of the external cache. The OPS-latency graph below demonstrates how the sequential read performance has been improved by more than 2x through the steps explained above.
Sequential Write Performance
One observation was that the performance of our block measurements was slightly lower compared to the file protocol-based measurements. Our analysis indicated that this discrepancy is due to the higher latency of journal replication to a high-availability node. This motivated us to investigate parameter tuning, including the flow control settings for journal mirroring. Ultimately, our investigation revealed that the current settings are appropriate.
During our investigation into the performance discrepancy between block and file protocol measurements, we identified an approximately 34% performance regression caused by recent changes in the file system cleaner module. After consulting with the file system team, we decided to revert and rework these changes. As a result, performance was restored.
While investigating potential performance improvements, we implemented a few other enhancements based on our observations. Here are the two most important ones:
We noticed that the journal provisioned size was lower than expected. This could negatively impact the performance of write workloads. We modeled and recommended the correct size to address this issue.
Datasets with low compressibility had a redundant process that increased CPU cost per operation, which we addressed with an optimization.
Random Write Performance
Our initial 8KiB random write measurements indicate that the expected performance for this benchmark has been met, and no improvements were necessary. As anticipated, high journal replication latency was the performance bottleneck in this case.
Conclusion
Google Cloud NetApp Volumes (GCNV) block service performance was evaluated using 4-corner microbenchmarks focusing on sequential and random read/write workloads, revealing various performance bottlenecks, which were addressed through optimizations like thread pool resizing, caching policy adjustments, and parameter tuning to improve throughput and latency across workloads while maintaining balanced performance across the 4-corners.
Appendix
OPs-Latency Graph
The OPs-Latency graphs we shared on this document report either the achieved throughput in terms of either IOPs or MiB/s, as well as the latency from the ONTAP’s perspective. Clients might see higher latency depending on the connectivity performance to the ONTAP cluster. In these graphs, while the x-axis shows the throughput, the y-axis shows the latency. Each data point on each curve represents an iteration during the performance measurement. The leftmost data point represents the lowest achieved/offered load, while the rightmost one represents the highest achieved/delivered load.
... View more
This blog provides a comprehensive guide on leveraging Grafana's alerting capabilities to monitor NetApp Trident protect deployments. It covers the setup of alert rules for critical events, ensuring timely notifications for appVault and backup failures. By integrating Prometheus and Grafana, users can maintain the availability and reliability of Kubernetes applications, proactively addressing potential issues to ensure smooth operations and data protection.
... View more
Unlocking Kafka Performance with FSxN ONTAP
Discover how Apache Kafka, running in Kraft mode, achieves enterprise-grade performance, security, and cost-efficiency when paired with FSxN for NetApp ONTAP. This technical deep dive compares Kafka’s behavior across EBS, NVMe, and FSxN storage options—highlighting FSxN’s superior throughput, resilience, and scalability for mission-critical workloads.
Explore benchmark results, security best practices, and cost analysis that reveal why FSxN ONTAP is the optimal choice for both short-term and long-term Kafka deployments. Whether you're building real-time analytics pipelines or IoT messaging systems, this guide helps you make informed decisions for high-performance, durable, and cost-effective Kafka infrastructure.
... View more
Scaling Cassandra with FSxN ONTAP—Performance, Resilience, and Cost Efficiency
Explore how Apache Cassandra, a leading NoSQL database, achieves enterprise-grade scalability and reliability when deployed on FSxN for NetApp ONTAP. This article dives into the architectural advantages of ONTAP’s shared storage model, benchmarking results against EBS, and the cost-saving potential for medium to large-scale Cassandra clusters.
Learn how FSxN ONTAP enhances data durability, simplifies replication strategies, and supports high availability without sacrificing performance. Whether you're running mission-critical workloads or planning for horizontal scalability, this guide reveals why FSxN ONTAP is the smart choice for modern Cassandra deployments.
... View more
Hyper-V, Microsoft's virtualization platform, remains a significant player in the virtualization market, competing with other major platforms. It is widely adopted due to its integration with Windows Server, cost-effectiveness, and robust feature set.
With many planning to adopt Hyper-V, there are questions on how to protect virtual machines and address core challenges, such as
Achieving the 3-2-1 backup strategy, which entails maintaining three copies of data on two different media, with one copy stored off-site.
Ensuring the integrity and consistency of backups across different media and locations, which requires rigorous monitoring and management.
Substantial cost associated with maintaining multiple copies of data and off-site storage .
Meeting the recovery time objective (RTO) with fast and reliable recovery process
Ransomware can target backup files, rendering them useless. Ensuring backups are immutable and isolated from the production environment is a vital.
NetApp Backup and Recovery for Hyper-V, accessible from NetApp Console, can help to overcome these challenges effectively while also improving operational efficiency. With tighter integration with ONTAP, NetApp Backup and Recovery provides fast, space-efficient, crash-consistent, and VM-consistent backup and restore operations.
NetApp Backup and Recovery for Hyper-V is currently in private preview and supports standalone or failover cluster VM setup over SMB. Following are the supported features:
3-2-1 backup architecture to backup snapshots to secondary ONTAP cluster and/or to object store(AWS S3, Azure blob, StorageGrid, ONTAP S3)
Virtual machine backup and restore from ONTAP or from object store
Protect VMs with tamperproof snapshots or DataLocking on object store
Here are the prerequisite before configuring the Hyper-V VM backup.
Sign up to access Hyper-V protection service by filling up this form
Deploy NetApp Agent as given in the documentation and create organization
Grant "Backup and Recovery super admin" access in NetApp Console-> Administration->Identity and access
Install following features on Hyper-V server
8144 (NetApp Plug-in for Hyper-V)
8145 (NetApp Plug-in for Windows)
Powershell 7.4.2 or later
ASP.Net Core Runtime 8.0.12 Hosting Bundle
Ensure Host Guardian Service Role is installed
Ensure two-way HTTPS traffic is allowed for the following ports in Windows Firewall settings
Add the ONTAP cluster in NetApp->Storage-> Management which host all the VM virtual disk
From the NetApp Console go to left pane and click Protection -> Backup and Recovery. Select Hyper-V workload. First time users will be asked to enter the Hyper-V server details like server ip, credentials to connect to the server, port & path to install lightweight plugin
NetApp Backup and Recovery will automatically discover the virtual machines running in the Hyper-V environment. The first step to protect Hyper-V is to configure policy to set the backup strategy like (disk to disk , disk to object store, disk to disk to object store etc..) , schedule, frequency of backup across primary and secondary and enable snapshot locking.
VM's can be protected separately or in a group by creating protection group. Assuming all VM's need to be backed up together, create a protection group and add VM's to the protection group.
Add policy to the protection group and review the configuration.
Backup is configured for group of VM's. You can run on-demand backup on need basis.
Here is the result for the VM protection details after running the backup multiple times. The location column in the screenshot shows the location of saved snapshots. If the icon is blue, it indicates the snapshots are saved in primary ONTAP cluster, secondary ONTAP cluster and the object store.
Accidentally or intentionally, if an application gets deleted from a VM, the restore process is straightforward. You can easily browse through the VM list on the restore page and click restore.
Select the snapshot to recover the VM and initiate restore operation.
Select the location of the backup from where the VM must be restored.
Optionally, check the box if the VM need to be started post-restore.
Once the restore is done, verify the application setup is restored and the application works fine as usual.
Summary & Call to action:
NetApp Backup and Recovery configures and orchestrates the protection of Hyper-V virtual machines, thus simplifying the process and reducing the operational challenges of backup infrastructure. To know more about NetApp Backup and Recovery, check the documentation. NetApp Backup and Recovery is currently in private preview mode. Sign up by filling up the form to explore the functionality and share the feedback or queries with us at ng-backupservice-feedback@netapp.com.
Similar protection capabilities are available for other enterprise workloads and detailed blogs for each workload are as follows:
SQL Server
VMware
Kubernetes
... View more