Extending existing capabilities

okrause · ‎2025-08-20

We have significantly enhanced the monitoring capabilities of Google Cloud NetApp Volumes by integrating built-in Observability charts into its user interface. These charts offer a streamlined and convenient method for users to track crucial performance and utilization metrics for each individual volume, offering immediate insights into their operational status.

Extending existing capabilities

NetApp Volumes facilitates a comprehensive monitoring strategy by transmitting all relevant resource metrics to Google Cloud Monitoring. This integration allows centralized and more extensive observability, so that users can leverage the full power of Google Cloud's robust monitoring and notification tools. In Google Cloud Monitoring, these metrics can be viewed, analyzed, and correlated with other cloud resources, enabling a holistic view of the entire infrastructure.

For full information about using Cloud Monitoring to monitor key NetApp Volumes metrics in your project, see the Monitoring NetApp Volumes blog post. A follow-on blog post explains how to configure alerting in Cloud Monitoring to receive notifications for critical issues like volumes running out of space.

These blog posts link to a best practice dashboard that can easily be imported into Cloud Monitoring. The dashboard is an efficient way to keep an eye on all of the volumes in your project.

Observability UI in NetApp Volumes

Although it’s a good practice to keep an eye on all your volumes in daily operations, there are also times when you’re working with a specific volume and want to quickly look up the metrics for that volume. Of course, you can go to the dashboard and filter for the individual volume, but wouldn’t it be nice to quickly see relevant graphs in the Cloud Volumes UI?

That’s why we added the Observability tab to the volume details page in Cloud Console.

The example screenshot shows the content of the Observability tab of a volume named okdata. It shows a predefined dashboard for the volume, with multiple predefined charts. (Note that this is a demo volume with almost no I/O happening. That’s why the charts look so boring.)

The predefined charts are a selection of key metrics to watch. Let’s look into the details.

Volume capacity usage [%] shows how much of the volume space you are using versus the volume’s available space. If you reach 100%, you run out of free space and write operations to the volume will fail. To avoid that failure, you can set up alerting to get notified early enough to add more space to the volume. Note that space used by snapshots counts as consumed space.
Volume inode usage [%] shows how many inodes are used versus the number of available inodes in the volume. Every file or directory in the volume is an inode. If you run out of free inodes, you cannot add new files or folders to the volume. To avoid that possibility, you can set up alerting to get notified early enough to add more space to the volume.
Volume throughput usage [%] shows the throughput used versus its throughput capability.
In service levels Standard, Premium, and Extreme, the maximum volume throughput capability is defined by the volume size and its service level. If this graph hits the 100% ceiling for an extended period of time, that is an indication that the volume offers too little throughput and you should add capacity or change its service level. Note that hitting 100% will also cause latency to increase a lot.
For volumes of service type Flex, the maximum volume throughput capability is defined by the storage pool and shared among all the volumes in the pool. The graph shows volume throughput versus storage pool throughput capability. If another volume in the pool uses up the throughput capability, you may run into throughput limits even if the graph doesn’t show 100% utilization.

The three usage charts quickly tell you if you are running into capacity, inode, or performance limits.

The next three charts show absolute numbers for Volume throughput, Volume IOPS, and Volume latency. Volume latency shows the latency between the NFS/SMB request entering the volume until the response leaves the volume. Networking between your client and the volume or request queuing inside your client adds latency on top.

The next chart shows hot tier versus cold tier capacity for the volume. As you can see, the demo volume had over 800GiB of cold data and 71GiB of hot data.

A final chart shows the amount of read, write, and metadata operations performed to the volume. Unlike SAN workloads, NFS/SMB workloads can be very metadata intensive—for example, for directory lookups. This chart gives a high-level overview of the I/O operation mix of the workload inside the volume.

Note: NetApp Volumes sends metrics to Cloud Monitoring every 5 minutes. All the charts in Cloud Monitoring or the Observability tab show 5 minute samples. You can’t observe events that happen in a shorter time.

The new Observability feature is in preview, and we plan to introduce improvements to the predefined dashboard in the future. If you have suggestions for changes that would benefit every user, please tell us.

Best practice

Should you drop your Cloud Monitoring dashboard for NetApp Volumes and go with the Observability UI tab instead? The answer is probably no.

The Observability UI tab offers a predefined dashboard for individual volumes to gain quick insights into what is going on with the volume.

A Cloud Monitoring dashboard gives you an overview of all the volumes and allows you to monitor a “farm” of volumes.

This dual approach gives users both immediate, granular insights directly into NetApp Volumes and the extensive analytical capabilities offered by Google Cloud's broader monitoring platform.

Observability for Google Cloud NetApp Volumes

Extending existing capabilities

Observability UI in NetApp Volumes

Best practice