NetApp® Console is built on a restructured and enhanced foundation, encompassing platform investments, core services, enterprise readiness, security, ...read more
We're excited to announce the launch of our comprehensive Infrastructure as Code (IaC) repository for Azure NetApp Files deployments. Whether you're a seasoned cloud architect or just getting started with enterprise storage solutions, this repository provides everything you need to deploy, configure, and manage Azure NetApp Files at scale.
Why We Built This
Deploying Azure NetApp Files infrastructure can be complex, especially when integrating with virtual machines, networks, and enterprise workloads. Teams often face challenges such as:
Inconsistent deployments across environments
Manual configuration errors
Lack of standardized templates
Time-consuming setup processes
Limited examples for common use cases
We created this repository to address these pain points and provide the cloud community with battle-tested, production-ready templates that follow Azure best practices.
What's Inside the Repository
Three Deployment Options
We understand that every team has its preferred tools. That's why we've built templates for all three major IaC approaches:
ARM Templates - Native Azure Resource Manager templates for teams working exclusively in Azure
Terraform - Cross-platform IaC for multi-cloud environments
PowerShell - Automation scripts for Windows-centric teams and existing PowerShell workflows
Ready-to-Deploy Scenarios
The repository includes three primary deployment scenarios, each available in all three formats:
1. NFS Volume Deployment
The foundational building block for Azure NetApp Files:
NetApp account configuration
Capacity pool setup
NFS volume with customizable size and service level
Virtual network with delegated subnet
Perfect for teams who want to integrate ANF into existing infrastructure.
Deploy Now:
2. Linux VM with NFS Volume
A complete end-to-end solution:
Ubuntu 22.04 LTS virtual machine
Automatically mounted NFS volume
Network security group configuration
Public IP for SSH access
Production-ready networking setup
Ideal for development environments, testing, or single-server applications.
Deploy Now:
3. Multi-VM with Shared NFS Storage
Enterprise-grade high availability setup:
Multiple Linux VMs (configurable count)
Shared NFS volume across all instances
Azure Load Balancer for traffic distribution
Network security groups for each tier
High availability configuration
Designed for production workloads requiring scalability and redundancy.
Deploy Now:
Deploy to Azure in Minutes
One of the standout features is our "Deploy to Azure" buttons. With a single click, you can:
Launch the Azure Portal with pre-configured templates
Sign in with your Azure credentials
Fill in environment-specific parameters
Review and deploy
No need to clone repositories or set up local development environments for quick testing and proof-of-concepts.
Security Built-In
Security isn't an afterthought—it's baked into every template:
Network Security Groups with least-privilege access rules
Subnet delegation for Azure NetApp Files
Encryption in transit and at rest
Managed identities for secure authentication
Azure Key Vault integration for secrets management
RBAC controls for access management
All templates follow the Azure Well-Architected Framework security principles.
Comprehensive Documentation
We've included extensive documentation to help you succeed:
Deployment Guides - Step-by-step instructions for each scenario
Architecture Diagrams - Visual representations of deployed infrastructure
Troubleshooting Guides - Solutions to common issues
Parameter Files - Example configurations for different environments
Best Practices - Security, performance, and operational recommendations
Getting Started
Prerequisites
Before you begin, ensure you have:
An active Azure subscription
Contributor or Owner permissions on your target resource group
Azure NetApp Files enabled in your subscription
Your preferred IaC tool installed (Azure CLI, Terraform, or PowerShell)
Quick Start
# Clone the repositorygit clone https://github.com/NetApp/azure-netapp-files-storage.gitcd azure-netapp-files-storage # Choose your toolcd arm-templates/ # or terraform/ or powershell/ # Start with the basic scenariocd nfs-volume/ # Follow the README for deployment instructions
Or simply click one of the "Deploy to Azure" buttons above for instant deployment.
Real-World Use Cases
This repository supports a variety of enterprise scenarios:
Development and Testing - Quickly spin up isolated environments with shared storage for development teams
High-Performance Computing - Deploy scalable NFS storage for compute-intensive workloads
Database Workloads - Host database files on high-performance NFS volumes with enterprise features
Web and Application Servers - Share configuration files, logs, and content across multiple Linux servers with load balancing
Content Management - Share media files and documents across multiple application servers
Backup and Recovery - Leverage snapshot capabilities for data protection
Community and Support
This is an open-source project, and we welcome community contributions:
Report Issues - Found a bug? Let us know through GitHub Issues
Request Features - Have an idea? Submit a feature request
Ask Questions - Use GitHub Discussions for community support
Contribute - Submit pull requests to improve templates and documentation
What's Next
We're continuously improving this repository based on community feedback. Our roadmap includes:
Additional workload-specific templates (SAP, Oracle, SQL Server)
Cross-region replication scenarios
Disaster recovery configurations
Performance optimization guides
Cost optimization templates
Integration with Azure Monitor and Log Analytics
Try It Today
Ready to simplify your Azure NetApp Files deployments? Visit the repository and start deploying:
GitHub Repository: https://github.com/NetApp/azure-netapp-files-storage
Whether you're deploying your first NFS volume or architecting enterprise-scale solutions, this repository provides the tools and guidance you need to succeed.
Have questions or feedback? Open an issue on GitHub or join the discussion. We'd love to hear how you're using these templates in your environment!
... View more
NetApp® Trident™ protect provides advanced application data management capabilities that enhance the functionality and availability of stateful Kubernetes applications supported by NetApp ONTAP storage systems and the NetApp Trident Container Storage Interface (CSI) storage provisioner. It is compatible with a wide range of fully managed and self-managed Kubernetes offerings (see the supported Kubernetes distributions and storage back ends), making it an optimal solution for protecting your Kubernetes services across various platforms and regions. In this blog post, I will demonstrate how to scrape and visualize the metrics provided by Trident and Trident Protect using the popular open-source monitoring and visualization frameworks Prometheus and Grafana.
Prerequisites
To follow along with this guide, ensure you have the following:
A Kubernetes cluster with the latest versions of Trident and Trident protect installed, and their associated kubeconfig files
A NetApp ONTAP storage back end and Trident with configured storage back ends, storage classes, and volume snapshot classes
A configured object storage buckets for storing backups and metadata information, with bucket replication configured
A workstation with kubectl configured to use kubeconfig
The tridentctl-protect CLI of Trident protect installed on your workstation
Admin user permission on the Kubernetes clusters
Prepare test environment
First, we quickly go through the setup of the test environment that we used throughout the blog.
Sample application
We will use a simple MinIO application with a persistent volume on Azure NetApp Files (ANF) as our sample application for the monitoring tests. The MinIO application is deployed on an Azure Kubernetes Service (AKS) cluster with NetApp Trident 25.06.0 installed and configured:
$ kubectl get all,pvc -n minio
NAME READY STATUS RESTARTS AGE
pod/minio-67dffb8bbd-5rfpm 1/1 Running 0 14m
pod/minio-console-677bd9ddcb-27497 1/1 Running 0 14m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/minio ClusterIP 172.16.61.243 <none> 9000/TCP 14m
service/minio-console ClusterIP 172.16.95.239 <none> 9090/TCP 14m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/minio 1/1 1 1 14m
deployment.apps/minio-console 1/1 1 1 14m
NAME DESIRED CURRENT READY AGE
replicaset.apps/minio-67dffb8bbd 1 1 1 14m
replicaset.apps/minio-console-677bd9ddcb 1 1 1 14m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/minio Bound pvc-ec50d895-4048-4a51-a651-5439b2a5ba2a 50Gi RWO azure-netapp-files-standard <unset> 14m
Create a Trident Protect Application
Create a Trident protect application minio based on the minio namespace with the Trident protect CLI:
$ tridentctl-protect create application minio --namespaces minio -n minio
Application "minio" created.
Create a snapshot minio-snap and a backup minio-bkp:
$ tridentctl-protect create snapshot minio-snap --app minio --appvault demo -n minio
Snapshot "minio-snap" created.
$ tridentctl-protect create backup minio-bkp --app minio --appvault demo -n minio
Backup "minio-bkp" created.
Install kube-state-metrics
Trident protect leverages kube-state-metrics (KSM) to provide information about the health status of its resources. Kube-state-metrics is an open-source add-on for Kubernetes that listens to the Kubernetes API server and generates metrics about the state of various Kubernetes objects.
Install Prometheus ServiceMonitor CRD
First, we install the Custom Resource Definition (CRD) for the Prometheus ServiceMonitor using Helm. Add the Prometheus-community helm repository:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
Install and configure kube-state-metrics
Now, we install and configure kube-state-metrics to generate metrics from Kubernetes API communication. Using it with Trident Protect will expose useful information about the state of Trident Protect custom resources in our environment.
Let's create a configuration file for the KSM helm chart to monitor these Trident Protect CRs:
Snapshots
Backups
ExecutionHooksRuns
AppVaults (added in a later step)
Let’s take a closer look at the snapshot CR minio-snap that we created earlier.
$ k -n minio get snapshot minio-snap -o yaml
apiVersion: protect.trident.netapp.io/v1
kind: Snapshot
metadata:
annotations:
protect.trident.netapp.io/correlationid: 42111244-fdb7-41f1-af39-7b61fdb0c7e1
creationTimestamp: "2025-08-18T15:25:40Z"
...
name: minio-snap
namespace: minio
ownerReferences:
- apiVersion: protect.trident.netapp.io/v1
kind: Application
name: minio
uid: efc8cdd4-8b20-48e0-8944-eeee8aba98f9
resourceVersion: "14328"
uid: c569472c-ae13-4d30-bffd-98acef304abc
spec:
appVaultRef: demo
applicationRef: minio
cleanupSnapshot: false
completionTimeout: 0s
reclaimPolicy: Delete
volumeSnapshotsCreatedTimeout: 0s
volumeSnapshotsReadyToUseTimeout: 0s
status:
appArchivePath: minio_efc8cdd4-8b20-48e0-8944-eeee8aba98f9/snapshots/20250818152540_minio-snap_c569472c-ae13-4d30-bffd-98acef304abc
appVaultRef: demo
completionTimestamp: "2025-08-18T15:25:58Z"
...
postSnapshotExecHooksRunResults: []
preSnapshotExecHooksRunResults: []
state: Completed
volumeSnapshots:
- name: snapshot-c569472c-ae13-4d30-bffd-98acef304abc-pvc-ec50d895-4048-4a51-a651-5439b2a5ba2a
namespace: minio
From its metadata section, we want to expose the name, UID, and creationTimestamp of the snapshot to Prometheus and from the spec and status fields the metrics appVautlRef, applicationRef, and state. The corresponding KSM configuration entry will look like this.
resources:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Snapshot"
version: "v1"
labelsFromPath:
snapshot_uid: [metadata, uid]
snapshot_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: snapshot_info
help: "Exposes details about the Snapshot state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
From the backup CR, which has the same structure as the snapshot CR, we can collect the same information using this KSM configuration entry.
resources:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Backup"
version: "v1"
labelsFromPath:
backup_uid: [metadata, uid]
backup_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: backup_info
help: "Exposes details about the Backup state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
To access those CR fields, KSM needs to have the corresponding RBAC permissions to allow access to the snapshot and backup CRs in all namespaces (since the Trident protect CRs are created in the application namespace). So we add the following parameters to the KSM configuration file.
rbac:
extraRules:
- apiGroups: ["protect.trident.netapp.io"]
resources: ["snapshots", "backups"]
verbs: ["list", "watch"]
# collect metrics from ALL namespaces
namespaces: ""
Collecting the details for the executionHooksRuns works in the same way as for snapshots and backups, so we don’t show the details here. Putting everything together, our first KSM configuration file looks like this.
$ cat metrics-config-backup-snapshot-hooks.yaml
extraArgs:
# collect only our metrics, not the defaults ones (deployments etc.)
- --custom-resource-state-only=true
customResourceState:
enabled: true
config:
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Snapshot"
version: "v1"
labelsFromPath:
snapshot_uid: [metadata, uid]
snapshot_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: snapshot_info
help: "Exposes details about the Snapshot state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Backup"
version: "v1"
labelsFromPath:
backup_uid: [metadata, uid]
backup_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: backup_info
help: "Exposes details about the Backup state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Exechooksruns"
version: "v1"
labelsFromPath:
ehr_uid: [metadata, uid]
ehr_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: ehr_info
help: "Exposes details about the Exec Hook state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
stage: ["spec", stage]
action: ["spec", action]
status: [status, state]
rbac:
extraRules:
- apiGroups: ["protect.trident.netapp.io"]
resources: ["snapshots"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["backups"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["exechooksruns"]
verbs: ["list", "watch"]
# collect metrics from ALL namespaces
namespaces: ""
# deploy a ServiceMonitor so the metrics are collected by Prometheus
prometheus:
monitor:
enabled: true
additionalLabels:
release: prometheus
Now we can install the KSM using Helm.
$ helm install trident-protect -f ./metrics-config-backup-snapshot-hooks.yaml prometheus-community/kube-state-metrics --version 5.21.0 -n prometheus
NAME: trident-protect
LAST DEPLOYED: Tue Aug 19 17:54:22 2025
NAMESPACE: prometheus
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
The exposed metrics can be found here:
https://github.com/kubernetes/kube-state-metrics/blob/master/docs/README.md#exposed-metrics
The metrics are exported on the HTTP endpoint /metrics on the listening port.
In your case, trident-protect-kube-state-metrics.prometheus.svc.cluster.local:8080/metrics
They are served either as plaintext or protobuf depending on the Accept header.
They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint.
We check that the KSM ServiceMonitor was correctly deployed in the prometheus namespace.
$ kubectl -n prometheus get smon -l app.kubernetes.io/instance=trident-protect
NAME AGE
trident-protect-kube-state-metrics 90s
$ kubectl get all -n prometheus
NAME READY STATUS RESTARTS AGE
pod/trident-protect-kube-state-metrics-94d55666c-69j6n 1/1 Running 0 105s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/trident-protect-kube-state-metrics ClusterIP 172.16.88.31 <none> 8080/TCP 105s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trident-protect-kube-state-metrics 1/1 1 1 105s
NAME DESIRED CURRENT READY AGE
replicaset.apps/trident-protect-kube-state-metrics-94d55666c 1 1 1 105s
Prometheus installation
Let’s install Prometheus now on our cluster. Before doing that, we must make sure that the Prometheus server can access the Kubernetes API.
RBAC permissions
The Prometheus server needs access to the Kubernetes API to scrape targets. Therefore, a ServiceAccount is required to provide access to those resources, which must be created and bound to a ClusterRole accordingly. By applying the yaml file below, we create the ServiceAccount prometheus and a ClusterRole prometheus with the necessary privileges, that we bind to the ServiceAccount.
$ cat ./rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
namespace: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
namespace: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: prometheus
$ kubectl apply -f ./rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
Now we’re ready to install Prometheus.
Deploy Prometheus
After creating the Prometheus ServiceAccount and giving it access to the Kubernetes API, we can deploy the Prometheus instance.
We’ll use the Prometheus operator for the installation. Following the instructions to install the operator in the prometheus namespace will install the operator in some minutes on our K8s cluster.
This manifest defines the serviceMonitor, NamespaceSelector, serviceMonitorSelector, and podMonitorSelector fields to specify which CRs to include. In this example, the {} value is used to match all existing CRs.
$ cat ./prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
podMonitorSelector: {}
resources:
requests:
memory: 400Mi
We apply the manifest and check that the Prometheus instance reaches the Running state eventually and a prometheus-operated Service was created:
$ kubectl apply -f ./prometheus.yaml
prometheus.monitoring.coreos.com/prometheus created
$ kubectl get prometheus -n prometheus
NAME VERSION DESIRED READY RECONCILED AVAILABLE AGE
prometheus 1 True True 42s
$ kubectl get services -n prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-operated ClusterIP None <none> 9090/TCP 103s
prometheus-operator ClusterIP None <none> 8080/TCP 7m44s
trident-protect-kube-state-metrics ClusterIP 172.16.88.31 <none> 8080/TCP 17h
$ kubectl get all -n prometheus
NAME READY STATUS RESTARTS AGE
pod/prometheus-operator-5d697c648f-22lrz 1/1 Running 0 6m21s
pod/prometheus-prometheus-0 2/2 Running 0 20s
pod/trident-protect-kube-state-metrics-94d55666c-69j6n 1/1 Running 0 17h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-operated ClusterIP None <none> 9090/TCP 20s
service/prometheus-operator ClusterIP None <none> 8080/TCP 6m21s
service/trident-protect-kube-state-metrics ClusterIP 172.16.88.31 <none> 8080/TCP 17h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-operator 1/1 1 1 6m21s
deployment.apps/trident-protect-kube-state-metrics 1/1 1 1 17h
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-operator-5d697c648f 1 1 1 6m21s
replicaset.apps/trident-protect-kube-state-metrics-94d55666c 1 1 1 17h
NAME READY AGE
statefulset.apps/prometheus-prometheus 1/1 20s
To quickly test the Prometheus installation, let’s use port-forwarding.
$ kubectl -n prometheus port-forward svc/prometheus-operated 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
By pointing a web browser to http://localhost:9090 we can view the Prometheus console:
Configure the monitoring tools to work together
After we have now installed all the monitoring tools, we need to configure them to work together. To integrate the kube-state-metrics with Prometheus, we edit our Prometheus configuration file (prometheus.yaml) and add the kube-state-metrics service information to it, saving it as prometheus-ksm.yaml.
$ cat ./prometheus-ksm.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
podMonitorSelector: {}
resources:
requests:
memory: 400Mi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: trident-protect
data:
prometheus.yaml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.trident-protect.svc:8080']
$ diff ./prometheus.yaml ./prometheus-ksm.yaml
13a14,27
> ---
> apiVersion: v1
> kind: ConfigMap
> metadata:
> name: prometheus-config
> namespace: trident-protect
> data:
> prometheus.yaml: |
> global:
> scrape_interval: 15s
> scrape_configs:
> - job_name: 'kube-state-metrics'
> static_configs:
> - targets: ['kube-state-metrics.trident-protect.svc:8080']
After applying the manifest, we confirm that the prometheus-config configuration map was created in the trident-protect namespace:
$ kubectl apply -f ./prometheus-ksm.yaml
prometheus.monitoring.coreos.com/prometheus unchanged
configmap/prometheus-config created
$ kubectl -n trident-protect get cm
NAME DATA AGE
kube-root-ca.crt 1 46h
prometheus-config 1 59s
trident-protect-env-config 15 46h
Now we can query the backups, snapshot, and execution hooks run information in Prometheus:
This matches the two snapshots and one backup and the six execution hook runs we have in Trident protect:
$ tridentctl-protect get snapshot -A
+-----------+---------------------------------------------+-------+----------------+-----------+-------+-------+
| NAMESPACE | NAME | APP | RECLAIM POLICY | STATE | ERROR | AGE |
+-----------+---------------------------------------------+-------+----------------+-----------+-------+-------+
| minio | backup-3473b771-caa5-48d2-a9b6-41f4448a049d | minio | Delete | Completed | | 1d22h |
| minio | minio-snap | minio | Delete | Completed | | 1d22h |
+-----------+---------------------------------------------+-------+----------------+-----------+-------+-------+
$ tridentctl-protect get backup -A
+-----------+--------------+-------+----------------+-----------+-------+-------+
| NAMESPACE | NAME | APP | RECLAIM POLICY | STATE | ERROR | AGE |
+-----------+--------------+-------+----------------+-----------+-------+-------+
| minio | minio-backup | minio | Retain | Completed | | 1d22h |
+-----------+--------------+-------+----------------+-----------+-------+-------+
$ kubectl get ehr -A
NAMESPACE NAME STATE STAGE ACTION ERROR APP AGE
minio post-backup-3473b771-caa5-48d2-a9b6-41f4448a049d Completed Post Backup minio 46h
minio post-snapshot-7e7934a4-b51a-4bc4-a981-28a8ba137ff6 Completed Post Snapshot minio 46h
minio post-snapshot-c569472c-ae13-4d30-bffd-98acef304abc Completed Post Snapshot minio 46h
minio pre-backup-3473b771-caa5-48d2-a9b6-41f4448a049d Completed Pre Backup minio 46h
minio pre-snapshot-7e7934a4-b51a-4bc4-a981-28a8ba137ff6 Completed Pre Snapshot minio 46h
minio pre-snapshot-c569472c-ae13-4d30-bffd-98acef304abc Completed Pre Snapshot minio 46h
Let’s create a 2nd backup:
$ tridentctl-protect create backup minio-bkp-2 --app minio --appvault demo --reclaim-policy Delete -n minio
Backup "minio-bkp-2" created.
Prometheus quickly catches the backup in the Running state, and the Completed state once the backup finishes.
Add additional metrics and information
Now, we want to add metrics about additional custom resources to Prometheus and see error states (if any) of the monitored custom resources reflected in Prometheus.
AppVault metrics and error details
To include metrics about the appVault CR and its error details, we add the below entries to the KSM configuration file:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "AppVault"
version: "v1"
labelsFromPath:
appvault_uid: [metadata, uid]
appvault_name: [metadata, name]
metricsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
metrics:
- name: appvault_info
help: "Exposes details about the AppVault state"
each:
type: Info
info:
labelsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
The complete configuration file to catch metrics and error details from snapshot, backups, execHooksRun, and appVault CR is then:
$ cat ./metrics-config-backup-snapshot-hooks-appvault.yaml
extraArgs:
# collect only our metrics, not the defaults ones (deployments etc.)
- --custom-resource-state-only=true
customResourceState:
enabled: true
config:
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Snapshot"
version: "v1"
labelsFromPath:
snapshot_uid: [metadata, uid]
snapshot_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: snapshot_info
help: "Exposes details about the Snapshot state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Backup"
version: "v1"
labelsFromPath:
backup_uid: [metadata, uid]
backup_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: backup_info
help: "Exposes details about the Backup state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Exechooksruns"
version: "v1"
labelsFromPath:
ehr_uid: [metadata, uid]
ehr_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: ehr_info
help: "Exposes details about the Exec Hook state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
stage: ["spec", stage]
action: ["spec", action]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "AppVault"
version: "v1"
labelsFromPath:
appvault_uid: [metadata, uid]
appvault_name: [metadata, name]
metricsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
metrics:
- name: appvault_info
help: "Exposes details about the AppVault state"
each:
type: Info
info:
labelsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
rbac:
extraRules:
- apiGroups: ["protect.trident.netapp.io"]
resources: ["snapshots"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["backups"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["exechooksruns"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["appvaults"]
verbs: ["list", "watch"]
# collect metrics from ALL namespaces
namespaces: ""
# deploy a ServiceMonitor so the metrics are collected by Prometheus
prometheus:
monitor:
enabled: true
additionalLabels:
release: prometheus
We update the KSM configuration:
$ helm upgrade trident-protect prometheus-community/kube-state-metrics -f ./metrics-config-backup-snapshot-hooks-appvault.yaml -n prometheus
Release "trident-protect" has been upgraded. Happy Helming!
NAME: trident-protect
LAST DEPLOYED: Wed Aug 20 16:47:06 2025
NAMESPACE: prometheus
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
The exposed metrics can be found here:
https://github.com/kubernetes/kube-state-metrics/blob/master/docs/README.md#exposed-metrics
The metrics are exported on the HTTP endpoint /metrics on the listening port.
In your case, trident-protect-kube-state-metrics.prometheus.svc.cluster.local:8080/metrics
They are served either as plaintext or protobuf depending on the Accept header.
They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint.
Now the information about the appVault CR is available in Prometheus.
Test AppVault failure
To test Prometheus’ monitoring and error recognition, we test a failure of or appVault CR. To simulate losing access to the object storage bucket behind the appVault CR, we delete the secret with the access credential from the trident-protect namespace.
$ kubectl -n trident-protect delete secret puneptunetest
secret "puneptunetest" deleted
After some seconds, the AppVault CR goes into the Error state.
$ tridentctl-protect get appvault
+------+----------+-------+--------------------------------+---------+-----+
| NAME | PROVIDER | STATE | ERROR | MESSAGE | AGE |
+------+----------+-------+--------------------------------+---------+-----+
| demo | Azure | Error | failed to resolve value for | | 2d |
| | | | accountKey: unable to ... | | |
+------+----------+-------+--------------------------------+---------+-----+
And the error of the appVault CR is also reflected in Prometheus:
AppMirrorRelationship metrics
With Trident protect, you can use the asynchronous replication capabilities of NetApp SnapMirror technology to replicate data and application changes from one storage backend to another, on the same cluster or between different clusters. AppMirrorRelationship (AMR) CRs control the replication relationship of an application protect by NetApp Snapmirror with Trident protect, so monitoring its state with Prometheus is essential.
This example config includes snapshot, backup, execHooksRun, appvault, and AMR metrics:
$ cat ./metrics-config-backup-snapshot-hooks-appvault-amr.yaml
extraArgs:
# collect only our metrics, not the defaults ones (deployments etc.)
- --custom-resource-state-only=true
customResourceState:
enabled: true
config:
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Snapshot"
version: "v1"
labelsFromPath:
snapshot_uid: [metadata, uid]
snapshot_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: snapshot_info
help: "Exposes details about the Snapshot state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Backup"
version: "v1"
labelsFromPath:
backup_uid: [metadata, uid]
backup_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: backup_info
help: "Exposes details about the Backup state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "Exechooksruns"
version: "v1"
labelsFromPath:
ehr_uid: [metadata, uid]
ehr_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: ehr_info
help: "Exposes details about the Exec Hook state"
each:
type: Info
info:
labelsFromPath:
appVaultReference: ["spec", "appVaultRef"]
appReference: ["spec", "applicationRef"]
stage: ["spec", stage]
action: ["spec", action]
status: [status, state]
error: [status, error]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "AppVault"
version: "v1"
labelsFromPath:
appvault_uid: [metadata, uid]
appvault_name: [metadata, name]
metricsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
metrics:
- name: appvault_info
help: "Exposes details about the AppVault state"
each:
type: Info
info:
labelsFromPath:
state: [status, state]
error: [status, error]
message: [status, message]
- groupVersionKind:
group: protect.trident.netapp.io
kind: "AppMirrorRelationship"
version: "v1"
labelsFromPath:
amr_uid: [metadata, uid]
amr_name: [metadata, name]
creation_time: [metadata, creationTimestamp]
metrics:
- name: app_mirror_relationship_info
help: "Exposes details about the AppMirrorRelationship state"
each:
type: Info
info:
labelsFromPath:
desiredState: ["spec", "desiredState"]
destinationAppVaultRef: ["spec", "destinationAppVaultRef"]
sourceAppVaultRef: ["spec", "sourceAppVaultRef"]
sourceApplicationName: ["spec", "sourceApplicationName"]
sourceApplicationUID: ["spec", "sourceApplicationUID"]
state: ["status", "state"]
error: ["status", "error"]
lastTransferStartTimestamp: ["status", "lastTransfer", "startTimestamp"]
lastTransferCompletionTimestamp: ["status", "lastTransfer", "completionTimestamp"]
lastTransferredSnapshotName: ["status", "lastTransferredSnapshot", "name"]
lastTransferredSnapshotCompletionTimestamp: ["status", "lastTransferredSnapshot", "completionTimestamp"]
destinationApplicationRef: ["status", "destinationApplicationRef"]
destinationNamespaces: ["status", "destinationNamespaces"]
promotedSnapshot: ["spec", "promotedSnapshot"]
recurrenceRule: ["spec", "recurrenceRule"]
storageClassName: ["spec", "storageClassName"]
namespaceMapping: ["spec", "namespaceMapping"]
conditions: ["status", "conditions"]
rbac:
extraRules:
- apiGroups: ["protect.trident.netapp.io"]
resources: ["snapshots"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["backups"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["exechooksruns"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["appvaults"]
verbs: ["list", "watch"]
- apiGroups: ["protect.trident.netapp.io"]
resources: ["appmirrorrelationships"]
verbs: ["list", "watch"]
# collect metrics from ALL namespaces
namespaces: ""
# deploy a ServiceMonitor so the metrics are collected by Prometheus
prometheus:
monitor:
enabled: true
additionalLabels:
release: prometheus
Trident metrics
The metrics provided by Trident enable you to do the following:
Keep tabs on Trident's health and configuration. You can examine how successful operations are and if it can communicate with the backends as expected.
Examine backend usage information and understand how many volumes are provisioned on a backend and the amount of space consumed, and so on.
Maintain a mapping of the number of volumes provisioned on available backends.
Track performance. You can look at how long it takes for Trident to communicate to backends and perform operations.
By default, Trident's metrics are exposed on the target port 8001 at the /metrics endpoint. These metrics are enabled by default when Trident is installed.
Create a Prometheus ServiceMonitor for Trident metrics
Prometheus was setup in the previous sections already, so to consume the Trident metrics, we create another Prometheus ServiceMonitor that watches the trident-csi service and listens on the metrics port. A sample ServiceMonitor configuration looks like this:
$ cat ./prometheus-trident-sm.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: trident-sm
namespace: prometheus
labels:
release: prom-operator
spec:
jobLabel: trident
selector:
matchLabels:
app: controller.csi.trident.netapp.io
namespaceSelector:
matchNames:
- trident
endpoints:
- port: metrics
interval: 15s
Let’s deploy the new ServiceMonitor in the prometheus namespace.
$ kubectl apply -f Prometheus/prometheus-trident-sm.yaml
servicemonitor.monitoring.coreos.com/trident-sm created
We can see that the new ServiceMonitor trident-sm is now running in the prometheus namespace:
$ kubectl -n prometheus get all,ServiceMonitor,cm
NAME READY STATUS RESTARTS AGE
pod/prometheus-operator-5d697c648f-22lrz 1/1 Running 0 6h1m
pod/prometheus-prometheus-0 2/2 Running 0 5h55m
pod/trident-protect-kube-state-metrics-99476b548-cv9ff 1/1 Running 0 28m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-operated ClusterIP None <none> 9090/TCP 5h55m
service/prometheus-operator ClusterIP None <none> 8080/TCP 6h1m
service/trident-protect-kube-state-metrics ClusterIP 172.16.88.31 <none> 8080/TCP 23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-operator 1/1 1 1 6h1m
deployment.apps/trident-protect-kube-state-metrics 1/1 1 1 23h
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-operator-5d697c648f 1 1 1 6h1m
replicaset.apps/trident-protect-kube-state-metrics-94d55666c 0 0 0 23h
replicaset.apps/trident-protect-kube-state-metrics-99476b548 1 1 1 28m
NAME READY AGE
statefulset.apps/prometheus-prometheus 1/1 5h55m
NAME AGE
servicemonitor.monitoring.coreos.com/trident-protect-kube-state-metrics 23h
servicemonitor.monitoring.coreos.com/trident-sm 32s
NAME DATA AGE
configmap/kube-root-ca.crt 1 24h
configmap/prometheus-prometheus-rulefiles-0 0 5h55m
configmap/trident-protect-kube-state-metrics-customresourcestate-config 1 23h
By checking for available targets in the Prometheus UI (http://localhost:9090/targets) we confirm that the Trident metrics are now available in Prometheus.
Query Trident metrics
We can now query the available Trident metrics in Prometheus.
For example, we can query the number of Trident snapshots, volumes, and bytes the allocated by Trident volumes in the Prometheus UI.
Grafana dashboards
Now that our monitoring system is functional, it’s time to give you an idea how to visualize the monitoring results. Let’s investigate Grafana dashboards!
Install Grafana
We install Grafana using the Grafana helm charts, first adding the Grafana helm repository:
$ helm repo add grafana https://grafana.github.io/helm-charts
Then we can install Grafana into the namespace grafana, which we create first.
$ kubectl create ns grafana
namespace/grafana created
$ helm install my-grafana grafana/grafana --namespace grafana
NAME: my-grafana
LAST DEPLOYED: Thu Aug 21 14:28:14 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:
kubectl get secret --namespace grafana my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
my-grafana.grafana.svc.cluster.local
Get the Grafana URL to visit by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace grafana port-forward $POD_NAME 3000
3. Login with the password from step 1 and the username: admin
#################################################################################
###### WARNING: Persistence is disabled!!! You will lose your data when #####
###### the Grafana pod is terminated. #####
#################################################################################
$ helm list -n grafana
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-grafana grafana 1 2025-08-21 14:28:14.772879 +0200 CEST deployed grafana-9.3.2 12.1.0
Following the instructions above, we retrieve the Grafana admin password and setup port forwarding.
$ kubectl get secret --namespace grafana my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
<REDACTED>
$ kubectl -n grafana port-forward svc/my-grafana 3000:80
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
Now we can test the access and login to the Grafana UI on http://localhost:3000, which works fine.
Enable persistent storage for Grafana
By default, Grafana only uses ephemeral storage, storing all data in the container’s file system. So, the data will be lost if the container stops. We follow the steps in the Grafana documentation to enable persistent storage for Grafana.
We download the values file and edit the values under the section of persistence, changing the enabled flag from false to true.
$ diff Grafana/values.yaml Grafana/values-persistence.yaml
418c418
< enabled: false
---
> enabled: true
Then we run helm upgrade to make the changes take effect.
$ helm upgrade my-grafana grafana/grafana -f Grafana/values-persistence.yaml -n grafana
Release "my-grafana" has been upgraded. Happy Helming!
NAME: my-grafana
LAST DEPLOYED: Thu Aug 21 14:37:24 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 2
NOTES:
1. Get your 'admin' user password by running:
kubectl get secret --namespace grafana my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
my-grafana.grafana.svc.cluster.local
Get the Grafana URL to visit by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace grafana port-forward $POD_NAME 3000
We confirm that a PVC backed by by Azure NetApp Files was created in the grafana namespace.
$ kubectl get all,pvc -n grafana
NAME READY STATUS RESTARTS AGE
pod/my-grafana-6d5b96b7d7-fqq7d 1/1 Running 0 5m18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/my-grafana ClusterIP 172.16.9.115 <none> 80/TCP 14m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/my-grafana 1/1 1 1 14m
NAME DESIRED CURRENT READY AGE
replicaset.apps/my-grafana-6ccff48567 0 0 0 14m
replicaset.apps/my-grafana-6d5b96b7d7 1 1 1 5m18s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/my-grafana Bound pvc-5a1844c6-3a9f-4f1d-9d94-caa1666ded3e 50Gi RWO azure-netapp-files-standard <unset> 5m19s
After restarting the port forwarding, we can login to Grafana again and continue working with persistent storage enabled.
Add a data source
Next, we need to add our Prometheus instance as a data source in Grafana. For doing this, we need the service name and port of Prometheus. Typically, when using the Prometheus Operator, the service name is something like prometheus-operated, so we check on our cluster.
$ kubectl -n prometheus get svc | grep operated
prometheus-operated ClusterIP None <none> 9090/TCP 27h
Now we can add the Prometheus instance as a data source in Grafana. Use the Kubernetes DNS to reference the Prometheus service. It should look something like this: http://prometheus-operated.prometheus.svc.cluster.local:9090
In the Grafana dashboard, we navigate to Menu -> Drilldown, which allows us to easily see the Trident and KSM Trident protect metrics.
Add a dashboard for the Trident protect metrics
Covering the creation of Grafana dashboards goes beyond the scope of this blog post. As an example and inspiration, we use the dashboard example for visualization of snapshot and backup metrics from Yves Weisser’s highly recommended collection of Trident lab scenarios on GitHub.
After downloading the dashboard json file from GitHub, we change the “Failed” option values to “Error” to display failed snapshot and backups in red in the dashboard.
$ diff Grafana/dashboard.json Grafana/dashboard_v2.json
365c365
< "Failed": {
---
> "Error": {
562c562
< "Failed": {
---
> "Error": {
709c709,710
< "25.02"
---
> "25.02",
> "25.06"
724c725
< }
\ No newline at end of file
---
> }
Now can import the dashboard json file into Grafana.
After importing the dashboard json file, the “Trident protect Global View” dashboard is available in Grafana. Here’s an example how it visualizes running and failed Trident protect backups.
Conclusion and call to action
By following this blog, you have successfully set up monitoring and visualization for NetApp Trident and Trident protect using Prometheus and Grafana. This setup enables you to keep tabs on the health and performance of your Trident and Trident protect resources, ensuring your Kubernetes applications are well-protected and efficiently managed.
Happy monitoring!
... View more
NetApp® Console is built on a restructured and enhanced foundation, encompassing platform investments, core services, enterprise readiness, security, administration and functionality. It is now the intuitive, intelligent, highly secure, and compliant single point of control for seamless management for your NetApp intelligent data infrastructure. With its reimagined user interface and experience, managing your NetApp Data Services and NetApp storage and has never been more intuitive, smarter, and insightful.
... View more
Microsoft has announced several new features for Azure NetApp Files (ANF). These updates bring meaningful improvements in performance, flexibility, security, and data mobility—making ANF an even more capable solution for organizations running demanding workloads in the cloud.
Whether you're managing infrastructure, supporting hybrid environments, or navigating compliance requirements, these enhancements are designed to help your organization operate more efficiently and securely.
Improved Data Mobility and Access
Two powerful data mobility features are now available:
Azure NetApp Files Cache Volumes
Azure NetApp Files Migration Assistant
Cache Volumes, built on NetApp’s ONTAP® FlexCache® technology, introduce a persistent, high-performance cache in Azure for origin volumes located outside ANF. This means active data can be accessed faster and more efficiently—even across WAN connections. For distributed teams or hybrid architectures, this capability enables low-latency access to critical files without duplicating entire datasets.
The Migration Assistant streamlines the process of moving data from on-premises ONTAP environments to Azure. It preserves metadata and minimizes downtime, helping your organization reduce migration complexity and network costs.
Flexible Pricing and Performance Optimization
Three new features are now GA that give organizations more control over cost and performance:
Flexible Service Level
Flexible Service Level with Cool Access
Short-Term Clones
With flexible service levels, you can dynamically adjust performance tiers based on workload needs—scaling up for high-performance tasks or scaling down to save costs during quieter periods. The addition of cool access tiers allows you to store infrequently accessed data at a lower cost, while maintaining availability when needed.
Short-term clones are ideal for development, testing, and analytics. These space-efficient, temporary copies allow teams to work with production-like data without consuming large amounts of storage, accelerating innovation while keeping costs in check.
Simplified VMware Integration
Azure NetApp Files now supports datastore integration with Azure VMware Solution (AVS) Generation 2, and notably, this no longer requires ExpressRoute.
This update simplifies deployment for organizations using AVS, making it easier to migrate VMware workloads to Azure. With ANF providing high-performance storage, teams can expect improved responsiveness and reliability for their virtualized environments—without the complexity of additional networking infrastructure.
Enhanced Security and Visibility
Security and compliance are top priorities for many organizations, and ANF’s latest updates deliver greater control and transparency:
Cross-Tenant Customer-Managed Keys for Volume Encryption
File Access Logs
With cross-tenant encryption, your organization can manage its own encryption keys—even in multi-tenant scenarios—ensuring data protection policies remain under your control. This is especially important for regulated industries or environments with strict governance requirements.
File Access Logs provide detailed visibility into file-level operations. These logs support audit trails, help detect unusual access patterns, and enable forensic analysis—making it easier to meet compliance standards and maintain operational integrity.
Who Benefits Most from These Updates?
These new features are particularly valuable for:
Organizations modernizing infrastructure or migrating to AVS The VMware integration and migration tools simplify transitions and reduce friction.
Teams with strict security and compliance requirements Enhanced encryption and logging capabilities support governance and regulatory needs.
IT departments looking to optimize storage costs and performance Flexible service levels, cool access tiers, and cache volumes deliver measurable efficiency gains.
Whether you're supporting enterprise applications, managing hybrid cloud environments, or enabling global collaboration, these enhancements to Azure NetApp Files offer practical tools to improve performance, reduce costs, and strengthen security.
Next Steps
If your organization is already using ANF or considering it as part of your cloud strategy, these new features are worth exploring. They offer tangible benefits across infrastructure, operations, and compliance—helping you get more value from your cloud investments.
Would you like help evaluating how these features could fit into your current environment or roadmap?
Let’s talk https://www.netapp.com/azure/contact/
... View more
Enterprise artificial intelligence represents one of the most transformative opportunities in modern business. Yet for many organizations, the path to AI implementation remains fraught with challenges. Data silos, security concerns, and the complexity of moving massive datasets to the cloud create barriers that can delay or derail AI initiatives entirely.
Microsoft and NetApp’s innovative approach changes this paradigm entirely. By enabling organizations to leverage Microsoft Azure AI services directly on their existing NetApp data infrastructure—whether on-premises or in the cloud—traditional bottlenecks are broken accelerating time-to-insight. This breakthrough capability transforms how enterprises approach AI, making it more accessible, secure, and cost-effective than ever before.
The Enterprise AI Imperative
The statistics paint a compelling picture. According to recent industry research, 82% of enterprises want to leverage their data with generative AI 1 , while Forrester predicts a 50% boost in productivity and creative problem-solving from enterprise AI initiatives 2 . Perhaps most striking, 60% of workers will use their own AI tools to perform tasks 3 , highlighting the urgent need for enterprise-grade AI infrastructure.
This surge in demand creates both opportunity and challenge. Organizations recognize AI's potential to revolutionize operations, enhance customer experiences, and drive competitive advantage. However, they also face significant obstacles in implementing AI solutions effectively.
Breaking Down Traditional Barriers
Traditional AI implementations require organizations to extract, transform, and move data to specialized platforms—a process that introduces risk, complexity, and cost. Data movement creates security vulnerabilities, compliance challenges, and often results in incomplete and out-of-sync datasets that compromise AI model accuracy.
Microsoft and NetApp’s approach eliminate these friction points through intelligent data infrastructure that brings AI capabilities directly to your data, regardless of location. This fundamental shift enables organizations to:
Maintain Data Sovereignty: Keep sensitive information exactly where it belongs while accessing powerful AI capabilities
Reduce Infrastructure Costs: Eliminate expensive data migration projects and redundant storage requirements
Accelerate Implementation: Launch AI initiatives in weeks rather than months or years
Ensure Compliance: Meet regulatory requirements by maintaining data residency and control
Azure NetApp Files: The Foundation for Intelligent Data Infrastructure
Azure NetApp Files serves as the cornerstone of this revolutionary approach. As a fully managed Microsoft storage service, it provides enterprise-grade performance, security, and scalability while maintaining seamless integration with Azure's comprehensive AI ecosystem.
The platform's architecture enables direct connectivity between your data and Azure's most powerful AI services, including Azure AI Foundry, Azure OpenAI, Azure AI Search, and Microsoft Fabric. This integration creates unprecedented opportunities for data utilization and insight generation.
Diagram: Unified Data Access & Intelligent Search Across
“Our clients are eager to tap into Azure AI, but success starts with having the right data foundation. NetApp’s integration with Azure allows Trace3 to help organizations streamline data management so they can focus on building impactful AI outcomes.”
— Jason Achten, Principal Cloud Architect, Digital Solutions, Trace3
Key Capabilities Driving Enterprise Success
Simplified Integration: The Object REST API unleashes the full power of Azure AI and data services on your enterprise data. This breakthrough eliminates complex data pipelines and enables direct access to AI capabilities through standardized interfaces.
Enhanced Productivity: Organizations achieve dramatic efficiency gains by combining existing enterprise data with Azure's robust AI and analytics services. Teams can build sophisticated models, generate insights, and create intelligent applications without data engineering overhead.
Robust Security Protocols: Azure NetApp Files maintains enterprise-grade security throughout the AI workflow. Data remains protected by comprehensive encryption, access controls, and monitoring capabilities that exceed industry standards.
Unlimited Scalability: Handle growing data volumes and increasing AI workloads without infrastructure constraints. The platform scales seamlessly from terabytes to petabytes while maintaining consistent performance.
Real-World Implementation Scenarios
Intelligent Document Processing
Consider a global financial services firm with millions of contracts, policies, and regulatory documents stored across multiple locations. Traditional approaches would require months of data migration and transformation before AI implementation could begin.
With NetApp's solution, the organization connects Azure AI Search and Azure OpenAI directly to existing document repositories. The AI services can immediately begin indexing, analyzing, and extracting insights from documents while they remain in their original locations. This approach enables rapid deployment of intelligent search capabilities, automated compliance checking, and contract analysis—all without moving a single file.
Advanced Analytics and Business Intelligence
Manufacturing organizations generate massive amounts of operational data across production facilities worldwide. A leading manufacturer leveraged Azure NetApp Files to connect Azure Databricks directly to production data.
The implementation enabled real-time quality analysis, predictive maintenance scheduling, and supply chain optimization. By keeping data in place, the organization maintained operational continuity while gaining powerful AI-driven insights that improved efficiency by 30% and reduced maintenance costs by 25%.
Diagram: Real-Time Analytics for AI/ML Workloads using Azure Databricks
Generative AI for Customer Service
Retail organizations can transform customer service by connecting Azure OpenAI to product catalogs, customer interaction histories, and knowledge bases stored on NetApp infrastructure. This integration enables intelligent chatbots and virtual assistants that provide accurate, contextually relevant responses while maintaining customer data privacy.
The solution processes thousands of concurrent interactions while continuously learning from new data, improving response quality and customer satisfaction scores without requiring data migration or system disruption.
The Strategic Advantage of Data-in-Place AI
Microsoft and NetApp’s approach deliver measurable business outcomes that extend far beyond technical capabilities:
Accelerated Time-to-Value: Organizations can deploy AI solutions in weeks rather than months, capturing business value immediately while competitors struggle with complex implementations.
Reduced Total Cost of Ownership: Eliminating data movement, duplicate storage, and complex integration projects significantly reduces infrastructure costs and operational overhead.
Enhanced Data Governance: Maintaining data in its original location preserves existing governance frameworks, compliance protocols, and security controls.
Improved Model Accuracy: AI models trained on complete, unmodified datasets deliver superior performance compared to models built on migrated or transformed data subsets.
Looking Forward: The seamless data and AI integration Revolution
The seamless integration of NetApp data with Azure AI and data services represents a quantum leap in enterprise AI accessibility. This breakthrough technology enables compatible access to NetApp storage systems, creating seamless integration with virtually any AI or analytics service.
Organizations will be able to build sophisticated AI pipelines using familiar tools and protocols while maintaining complete control over data location, security, and governance. The API supports both on-premises ONTAP systems and Azure NetApp Files, ensuring consistent capabilities across hybrid cloud environments.
This innovation positions NetApp customers at the forefront of the AI revolution, with infrastructure that adapts to emerging technologies while protecting existing investments.
Building Your AI-Ready Infrastructure
Success in the AI era requires intelligent data infrastructure that balances innovation with security, performance with cost-effectiveness, and scalability with simplicity. NetApp's comprehensive approach addresses all these requirements through proven technology and strategic partnerships.
Organizations implementing NetApp's AI-ready infrastructure report significant improvements in operational efficiency, decision-making speed, and competitive positioning. They achieve these outcomes while maintaining the highest standards of data security and regulatory compliance.
The convergence of NetApp's data management expertise with Microsoft's AI leadership creates unprecedented opportunities for enterprise transformation. Organizations that embrace this combination position themselves for sustained success in an increasingly AI-driven marketplace.
The future belongs to organizations that can harness the full power of their data assets. With NetApp and Microsoft Azure AI, that future is available today. To explore more about Azure NetApp Files go here: https://www.netapp.com/azure/azure-netapp-files/
1 “AI Transformation Study”, IDC, January 2024
2 “Forecast Analysis: AI-Optimized Servers, Worldwide”, Gartner, November 2024
3 “Scaling AI Initiatives Responsibly: The Critical Role of an Intelligent Data Infrastructure”, IDC, May 2024
... View more