Tech ONTAP Blogs

NetApp Trident protect alerting

PatricU
NetApp
103 Views

NetApp® Trident™ protect provides advanced application data management capabilities that enhance the functionality and availability of stateful Kubernetes applications supported by NetApp ONTAP storage systems and the NetApp Trident Container Storage Interface (CSI) storage provisioner. It is compatible with a wide range of fully managed and self-managed Kubernetes offerings (see the supported Kubernetes distributions and storage back ends), making it an optimal solution for protecting your Kubernetes services across various platforms and regions.

 

In a recent blog post, we demonstrated how to monitor Trident protect using the popular open-source monitoring and visualization frameworks Prometheus and Grafana by scraping metrics provided by Trident protect.

Now in this blog post, I will show you how to leverage Grafana’s alerting capabilities to easily send alerts in case of, e.g., a failed Trident protect backup.

Prerequisites

To follow along with this guide, ensure you have the following:

  • A Kubernetes cluster with the latest versions of Trident and Trident protect installed, and their associated kubeconfig files
  • A NetApp ONTAP storage back end and Trident with configured storage back ends, storage classes, and volume snapshot classes
  • A configured object storage buckets for storing backups and metadata information, with bucket replication configured
  • A workstation with kubectl configured to use kubeconfig 
  • The tridentctl-protect CLI of Trident protect installed on your workstation
  • Admin user permission on the Kubernetes clusters
  • Prometheus and Grafana installed and configured on your Kubernetes cluster as outlined in the NetApp Trident protect metrics and monitoring blog

Throughout this blog, we largely follow the Get started with Grafana Alerting tutorial to demonstrate Grafana alerting with Trident protect.

Creating a contact point for notifications

The first step is to create a contact point in Grafana, to make sure that our alerts are sending a notification somewhere. Contact points contain the configuration for sending alert notifications, including destinations like email, Slack, IRM, webhooks, and their notification messages. For our demonstration purposes, we use the webhook integration as outlined in the Grafana alerting tutorial. To create an endpoint to receive the alerts, we can use Webkook.site to easily set up a test endpoint.

In a browser window, we navigate to https://webhook.site/ and copy our unique URL (https://webhook.site/e5f1b116-3ce7-45e6-b773-05fc5136e080).

Screenshot 2025-10-28 at 15.33.54.png

Our webhook endpoint https://webhook.site/e5f1b116-3ce7-45e6-b773-05fc5136e080 is now ready to accept requests and we can create a contact point in Grafana. In another browser tab, we sign in to our Grafana account, in the sidebar, we hover over the Alerting (bell) icon and then click Contact points.

Screenshot 2025-10-28 at 15.34.48.png

After clicking + Create contact point, we enter Webhook as Name and choose Webhook in Integration. In URL, we paste the endpoint https://webhook.site/e5f1b116-3ce7-45e6-b773-05fc5136e080 to our webhook endpoint.

Screenshot 2025-10-28 at 15.36.00.png

Let’s send two test alerts to our webhook endpoint by clicking Test, then Send test notification twice.

Screenshot 2025-10-28 at 15.36.08.png

On the webhook.site, we can now see two POST / entries. We click on one of them to see what information Grafana sent.

Screenshot 2025-10-28 at 15.36.32.png

Back in Grafana, we click Save contact point.

Screenshot 2025-10-28 at 15.36.42.png

We have created a dummy Webhook endpoint and created a new Alerting contact point in Grafana. Now, we can create alert rules and link them to this new integration.

Alert rules

An alert rule in Grafana consists of one or more queries and expressions that select the data we want to measure. It contains a condition to trigger the alert, an evaluation period that determines how often the rule is evaluated, and additional options to manage alert events and their notifications.

As data source for the Grafana alerts, we’ll use Prometheus configured to scrape Trident protect kube-state-metrics as configured in the NetApp Trident protect metrics and monitoring blog.

 

The status.state fields of Trident protect CRs in general can have these possible values, but each CR will have specific logic on if the state will be used or not:

  • "Blocked"
  • "Running"
  • "Terminating"
  • "Error"
  • "Failed"
  • "Completed"
  • "RollingBack"
  • "Timeout"
  • "Unknown"
  • "Removed"
  • "Available"

CRs such as Snapshots/Backups or other CRs that have a terminal state (Completed, Failed, etc.) will use these. The most common ones that will be seen are Running/Completed/Blocked/Error/Failed. Failed/Error would be considered terminal where Error is a temporary error that may be rectified in future reconciliations.

Spoiler
Note that the AppMirrorRelationship CR used to control NetApp SnapMirror replication relationships with Trident protect will not use the above values and follows a different pattern singe it is a long lived CR.

Alert rule example for appVault failure

We want to be alerted should a Trident protect appVault custom resource, representing the object storage storing the Trident protect backups, fail. Let’s set up an alert rule to notify us should an appVault resource fail.

In Grafana, we navigate to Alerts -> Alerting -> Alert rules and click on + New alert rule.

Screenshot 2025-10-28 at 15.55.42.png

Next, enter “Failed appvault” as name for the alert rule. As data source we select the already configured Prometheus akspu-test1 data source from the drop-down menu and chose the kube_customressoure_appvault_info metric from the drop-down list of available metrics.

Screenshot 2025-10-28 at 15.58.43.png

We use the default options for Grafana-managed alert rule creation. The default options let us define the query, an expression (used to manipulate the data – the WHEN field in the UI), and the condition that must be met for the alert to be triggered (in default mode it is the threshold).

As we want to be alerted if any appVault CR is not in the Available state, we set the label filter to state != Available in the Metric section.

Screenshot 2025-10-28 at 15.59.39.png

Next, we need to create a folder that contains our alert rules. We name it "Trident protect" and click Create.

Screenshot 2025-10-28 at 16.11.49.png

The alert rule evaluation defines the conditions under which an alert rule triggers, based on the following settings:

  • Evaluation group: every alert rule is assigned to an evaluation group. We create a new evaluation group "Trident protect".
  • Evaluation interval: determines how frequently the alert rule is checked. We set it to 1m.
  • Pending period: how long the condition must be met to trigger the alert rule. We set it to 1m.
  • Keep firing for: defines how long an alert should remain in the Firing state after the alert condition stops being true. We set it to 0s, so the alert stops firing immediately after the condition is no longer true.

Screenshot 2025-10-28 at 16.12.22.png

The last step in creating the alert rule is to configure the notifications. We chose the previously created Webhook contact point from the drop-down list and click Save to create the alert rule.

Screenshot 2025-10-28 at 16.12.49.png

With the alert rule for failed appVault CRs in place, let’s try it out.

To simulate a failure of the appVault CR demo, we can simply delete the secret that contains the access credentials to the corresponding object storage bucket in the trident-protect namespace:

$ kubectl -n trident-protect delete secret puneptunetest
secret "puneptunetest" deleted from trident-protect namespace

Once Trident protect detects the inaccessibility of the demo appVault CR, Prometheus catches the error state:

Screenshot 2025-10-28 at 16.31.43.png

And Grafana fires an alert once the evaluation interval of 1m has concluded and we receive an alert notification in the Webhook endpoint:

Screenshot 2025-10-28 at 17.41.51.png

The alert notification details show that the alert rule state is firing, and it includes the value that made the rule trigger by exceeding the threshold of the alert rule condition. The notification also includes links to see the alert rule details, and another link to add a Silence to it:

{
	"receiver": "Webhook",
	"status": "firing",
	"alerts": [
		{
			"status": "firing",
			"labels": {
				"alertname": "Failed appVault",
				"appvault_name": "demo",
				"appvault_uid": "1d7b7238-4041-4c52-b5d8-f176b301df3d",
				"container": "kube-state-metrics",
				"customresource_group": "protect.trident.netapp.io",
				"customresource_kind": "AppVault",
				"customresource_version": "v1",
				"endpoint": "http",
				"error": "failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found",
				"grafana_folder": "Trident protect",
				"instance": "172.18.0.222:8080",
				"job": "kube-state-metrics",
				"namespace": "prometheus",
				"pod": "trident-protect-kube-state-metrics-c74f969cb-lp67z",
				"service": "trident-protect-kube-state-metrics",
				"state": "Error"
			},
			"annotations": {},
			"startsAt": "2025-10-28T15:27:50Z",
			"endsAt": "0001-01-01T00:00:00Z",
			"generatorURL": "http://localhost:3000/alerting/grafana/af2fab7y1pukgf/view?orgId=1",
			"fingerprint": "fa6ccf867636d207",
			"silenceURL": "http://localhost:3000/alerting/silence/new?alertmanager=grafana\u0026matcher=__alert_rule_uid__%3Daf2fab7y1pukgf\u0026matcher=appvault_name%3Ddemo\u0026matcher=appvault_uid%3D1d7b7238-4041-4c52-b5d8-f176b301df3d\u0026matcher=container%3Dkube-state-metrics\u0026matcher=customresource_group%3Dprotect.trident.netapp.io\u0026matcher=customresource_kind%3DAppVault\u0026matcher=customresource_version%3Dv1\u0026matcher=endpoint%3Dhttp\u0026matcher=error%3Dfailed+to+resolve+value+for+accountKey%3A+unable+to+get+secret+trident-protect%2Fpuneptunetest%3A+Secret+%22puneptunetest%22+not+found\u0026matcher=instance%3D172.18.0.222%3A8080\u0026matcher=job%3Dkube-state-metrics\u0026matcher=namespace%3Dprometheus\u0026matcher=pod%3Dtrident-protect-kube-state-metrics-c74f969cb-lp67z\u0026matcher=service%3Dtrident-protect-kube-state-metrics\u0026matcher=state%3DError\u0026orgId=1",
			"dashboardURL": "",
			"panelURL": "",
			"values": {
				"A": 1,
				"C": 1
			},
			"valueString": "[ var='A' labels={__name__=kube_customresource_appvault_info, appvault_name=demo, appvault_uid=1d7b7238-4041-4c52-b5d8-f176b301df3d, container=kube-state-metrics, customresource_group=protect.trident.netapp.io, customresource_kind=AppVault, customresource_version=v1, endpoint=http, error=failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found, instance=172.18.0.222:8080, job=kube-state-metrics, namespace=prometheus, pod=trident-protect-kube-state-metrics-c74f969cb-lp67z, service=trident-protect-kube-state-metrics, state=Error} value=1 ], [ var='C' labels={__name__=kube_customresource_appvault_info, appvault_name=demo, appvault_uid=1d7b7238-4041-4c52-b5d8-f176b301df3d, container=kube-state-metrics, customresource_group=protect.trident.netapp.io, customresource_kind=AppVault, customresource_version=v1, endpoint=http, error=failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found, instance=172.18.0.222:8080, job=kube-state-metrics, namespace=prometheus, pod=trident-protect-kube-state-metrics-c74f969cb-lp67z, service=trident-protect-kube-state-metrics, state=Error} value=1 ]",
			"orgId": 1
		}
	],
	"groupLabels": {
		"alertname": "Failed appVault",
		"grafana_folder": "Trident protect"
	},
	"commonLabels": {
		"alertname": "Failed appVault",
		"appvault_name": "demo",
		"appvault_uid": "1d7b7238-4041-4c52-b5d8-f176b301df3d",
		"container": "kube-state-metrics",
		"customresource_group": "protect.trident.netapp.io",
		"customresource_kind": "AppVault",
		"customresource_version": "v1",
		"endpoint": "http",
		"error": "failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found",
		"grafana_folder": "Trident protect",
		"instance": "172.18.0.222:8080",
		"job": "kube-state-metrics",
		"namespace": "prometheus",
		"pod": "trident-protect-kube-state-metrics-c74f969cb-lp67z",
		"service": "trident-protect-kube-state-metrics",
		"state": "Error"
	},
	"commonAnnotations": {},
	"externalURL": "http://localhost:3000/",
	"version": "1",
	"groupKey": "{}/{__grafana_autogenerated__=\"true\"}/{__grafana_receiver__=\"Webhook\"}:{alertname=\"Failed appVault\", grafana_folder=\"Trident protect\"}",
	"truncatedAlerts": 0,
	"orgId": 1,
	"title": "[FIRING:1] Failed appVault Trident protect (demo 1d7b7238-4041-4c52-b5d8-f176b301df3d kube-state-metrics protect.trident.netapp.io AppVault v1 http failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found 172.18.0.222:8080 kube-state-metrics prometheus trident-protect-kube-state-metrics-c74f969cb-lp67z trident-protect-kube-state-metrics Error)",
	"state": "alerting",
	"message": "**Firing**\n\nValue: A=1, C=1\nLabels:\n - alertname = Failed appVault\n - appvault_name = demo\n - appvault_uid = 1d7b7238-4041-4c52-b5d8-f176b301df3d\n - container = kube-state-metrics\n - customresource_group = protect.trident.netapp.io\n - customresource_kind = AppVault\n - customresource_version = v1\n - endpoint = http\n - error = failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found\n - grafana_folder = Trident protect\n - instance = 172.18.0.222:8080\n - job = kube-state-metrics\n - namespace = prometheus\n - pod = trident-protect-kube-state-metrics-c74f969cb-lp67z\n - service = trident-protect-kube-state-metrics\n - state = Error\nAnnotations:\nSource: http://localhost:3000/alerting/grafana/af2fab7y1pukgf/view?orgId=1\nSilence: http://localhost:3000/alerting/silence/new?alertmanager=grafana\u0026matcher=__alert_rule_uid__%3Daf2fab7y1pukgf\u0026matcher=appvault_name%3Ddemo\u0026matcher=appvault_uid%3D1d7b7238-4041-4c52-b5d8-f176b301df3d\u0026matcher=container%3Dkube-state-metrics\u0026matcher=customresource_group%3Dprotect.trident.netapp.io\u0026matcher=customresource_kind%3DAppVault\u0026matcher=customresource_version%3Dv1\u0026matcher=endpoint%3Dhttp\u0026matcher=error%3Dfailed+to+resolve+value+for+accountKey%3A+unable+to+get+secret+trident-protect%2Fpuneptunetest%3A+Secret+%22puneptunetest%22+not+found\u0026matcher=instance%3D172.18.0.222%3A8080\u0026matcher=job%3Dkube-state-metrics\u0026matcher=namespace%3Dprometheus\u0026matcher=pod%3Dtrident-protect-kube-state-metrics-c74f969cb-lp67z\u0026matcher=service%3Dtrident-protect-kube-state-metrics\u0026matcher=state%3DError\u0026orgId=1\n"
}

To resolve the failure, we recreate the secret again with correct credentials.

$ kubectl -n trident-protect create secret generic puneptunetest --from-literal=accountName=puneptunetest --from-literal=<REDACTED>
secret/puneptunetest created

It takes Trident protect some minutes to set the appVault CR back into the Available state.

$ tridentctl-protect get appvault
+------+----------+-----------+-------+---------+-------+
| NAME | PROVIDER |   STATE   | ERROR | MESSAGE |  AGE  |
+------+----------+-----------+-------+---------+-------+
| demo | Azure    | Available |       |         | 32d7h |
+------+----------+-----------+-------+---------+-------+

Grafana recognizes the state change and sends a notification with the alert status “resolved”.

Screenshot 2025-10-28 at 17.43.39.png

Alert rule example for backup failure

Now we also want to get alerted should any backups fail. Let’s create another alert rule in the already existing alert folder “Trident protect”.

Screenshot 2025-10-28 at 17.57.54.png

We name the alert rule “Failed backup”, select again the already configured Prometheus akspu-test1 data source from the drop-down menu of data sources, and chose the kube_customressoure_backup_info metric from the drop-down list of available metrics.

Screenshot 2025-10-28 at 17.58.36.png

Then we set the label filter to state = Error in the Metric section so that we’ll be alerted if a backup enters the Error state.

Screenshot 2025-10-28 at 18.27.23.png

We also use the Webhook contact point and the other settings as in the “Failed appvault” alert rule and save the rule.

Screenshot 2025-10-28 at 18.01.39.png

To create a backup that fails, we delete the puneptunetest secret again and then start a backup of a sample application alpine.

$ tridentctl-protect create backup --app alpine --appvault demo -n alpine
Backup "alpine-w2qjpw" created.

The backup fails quickly, and Grafana fires the alert.

$ tridentctl-protect get backup -n alpine
+---------------+--------+----------------+-------+--------------------------------+-------+
|     NAME      |  APP   | RECLAIM POLICY | STATE |             ERROR              |  AGE  |
+---------------+--------+----------------+-------+--------------------------------+-------+
| alpine-w2qjpw | alpine | Retain         | Error | failed to resolve value for    | 4m20s |
|               |        |                |       | accountKey: unable to ...      |       |
+---------------+--------+----------------+-------+--------------------------------+-------+

In the alert notification details, you can also find the detailed error message.

Screenshot 2025-10-28 at 18.43.57.png

{
	"receiver": "Webhook",
	"status": "firing",
	"alerts": [
		{
			"status": "firing",
			"labels": {
				"alertname": "Failed backup",
				"appReference": "alpine",
				"appVaultReference": "demo",
				"backup_name": "alpine-w2qjpw",
				"backup_uid": "581f1d40-2d1e-4f67-ab15-2066e2b53010",
				"container": "kube-state-metrics",
				"creation_time": "2025-10-28T17:41:09Z",
				"customresource_group": "protect.trident.netapp.io",
				"customresource_kind": "Backup",
				"customresource_version": "v1",
				"endpoint": "http",
				"error": "failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found",
				"grafana_folder": "Trident protect",
				"instance": "172.18.0.222:8080",
				"job": "kube-state-metrics",
				"namespace": "prometheus",
				"pod": "trident-protect-kube-state-metrics-c74f969cb-lp67z",
				"service": "trident-protect-kube-state-metrics",
				"status": "Error"
			},
			"annotations": {},
			"startsAt": "2025-10-28T17:42:50Z",
			"endsAt": "0001-01-01T00:00:00Z",
			"generatorURL": "http://localhost:3000/alerting/grafana/df2fk0qcke77kf/view?orgId=1",
			"fingerprint": "f497d0c6cb6d48c8",
			"silenceURL": "http://localhost:3000/alerting/silence/new?alertmanager=grafana\u0026matcher=__alert_rule_uid__%3Ddf2fk0qcke77kf\u0026matcher=appReference%3Dalpine\u0026matcher=appVaultReference%3Ddemo\u0026matcher=backup_name%3Dalpine-w2qjpw\u0026matcher=backup_uid%3D581f1d40-2d1e-4f67-ab15-2066e2b53010\u0026matcher=container%3Dkube-state-metrics\u0026matcher=creation_time%3D2025-10-28T17%3A41%3A09Z\u0026matcher=customresource_group%3Dprotect.trident.netapp.io\u0026matcher=customresource_kind%3DBackup\u0026matcher=customresource_version%3Dv1\u0026matcher=endpoint%3Dhttp\u0026matcher=error%3Dfailed+to+resolve+value+for+accountKey%3A+unable+to+get+secret+trident-protect%2Fpuneptunetest%3A+Secret+%22puneptunetest%22+not+found\u0026matcher=instance%3D172.18.0.222%3A8080\u0026matcher=job%3Dkube-state-metrics\u0026matcher=namespace%3Dprometheus\u0026matcher=pod%3Dtrident-protect-kube-state-metrics-c74f969cb-lp67z\u0026matcher=service%3Dtrident-protect-kube-state-metrics\u0026matcher=status%3DError\u0026orgId=1",
			"dashboardURL": "",
			"panelURL": "",
			"values": {
				"A": 1,
				"C": 1
			},
			"valueString": "[ var='A' labels={__name__=kube_customresource_backup_info, appReference=alpine, appVaultReference=demo, backup_name=alpine-w2qjpw, backup_uid=581f1d40-2d1e-4f67-ab15-2066e2b53010, container=kube-state-metrics, creation_time=2025-10-28T17:41:09Z, customresource_group=protect.trident.netapp.io, customresource_kind=Backup, customresource_version=v1, endpoint=http, error=failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found, instance=172.18.0.222:8080, job=kube-state-metrics, namespace=prometheus, pod=trident-protect-kube-state-metrics-c74f969cb-lp67z, service=trident-protect-kube-state-metrics, status=Error} value=1 ], [ var='C' labels={__name__=kube_customresource_backup_info, appReference=alpine, appVaultReference=demo, backup_name=alpine-w2qjpw, backup_uid=581f1d40-2d1e-4f67-ab15-2066e2b53010, container=kube-state-metrics, creation_time=2025-10-28T17:41:09Z, customresource_group=protect.trident.netapp.io, customresource_kind=Backup, customresource_version=v1, endpoint=http, error=failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found, instance=172.18.0.222:8080, job=kube-state-metrics, namespace=prometheus, pod=trident-protect-kube-state-metrics-c74f969cb-lp67z, service=trident-protect-kube-state-metrics, status=Error} value=1 ]",
			"orgId": 1
		}
	],
	"groupLabels": {
		"alertname": "Failed backup",
		"grafana_folder": "Trident protect"
	},
	"commonLabels": {
		"alertname": "Failed backup",
		"appReference": "alpine",
		"appVaultReference": "demo",
		"backup_name": "alpine-w2qjpw",
		"backup_uid": "581f1d40-2d1e-4f67-ab15-2066e2b53010",
		"container": "kube-state-metrics",
		"creation_time": "2025-10-28T17:41:09Z",
		"customresource_group": "protect.trident.netapp.io",
		"customresource_kind": "Backup",
		"customresource_version": "v1",
		"endpoint": "http",
		"error": "failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found",
		"grafana_folder": "Trident protect",
		"instance": "172.18.0.222:8080",
		"job": "kube-state-metrics",
		"namespace": "prometheus",
		"pod": "trident-protect-kube-state-metrics-c74f969cb-lp67z",
		"service": "trident-protect-kube-state-metrics",
		"status": "Error"
	},
	"commonAnnotations": {},
	"externalURL": "http://localhost:3000/",
	"version": "1",
	"groupKey": "{}/{__grafana_autogenerated__=\"true\"}/{__grafana_receiver__=\"Webhook\"}:{alertname=\"Failed backup\", grafana_folder=\"Trident protect\"}",
	"truncatedAlerts": 0,
	"orgId": 1,
	"title": "[FIRING:1] Failed backup Trident protect (alpine demo alpine-w2qjpw 581f1d40-2d1e-4f67-ab15-2066e2b53010 kube-state-metrics 2025-10-28T17:41:09Z protect.trident.netapp.io Backup v1 http failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found 172.18.0.222:8080 kube-state-metrics prometheus trident-protect-kube-state-metrics-c74f969cb-lp67z trident-protect-kube-state-metrics Error)",
	"state": "alerting",
	"message": "**Firing**\n\nValue: A=1, C=1\nLabels:\n - alertname = Failed backup\n - appReference = alpine\n - appVaultReference = demo\n - backup_name = alpine-w2qjpw\n - backup_uid = 581f1d40-2d1e-4f67-ab15-2066e2b53010\n - container = kube-state-metrics\n - creation_time = 2025-10-28T17:41:09Z\n - customresource_group = protect.trident.netapp.io\n - customresource_kind = Backup\n - customresource_version = v1\n - endpoint = http\n - error = failed to resolve value for accountKey: unable to get secret trident-protect/puneptunetest: Secret \"puneptunetest\" not found\n - grafana_folder = Trident protect\n - instance = 172.18.0.222:8080\n - job = kube-state-metrics\n - namespace = prometheus\n - pod = trident-protect-kube-state-metrics-c74f969cb-lp67z\n - service = trident-protect-kube-state-metrics\n - status = Error\nAnnotations:\nSource: http://localhost:3000/alerting/grafana/df2fk0qcke77kf/view?orgId=1\nSilence: http://localhost:3000/alerting/silence/new?alertmanager=grafana\u0026matcher=__alert_rule_uid__%3Ddf2fk0qcke77kf\u0026matcher=appReference%3Dalpine\u0026matcher=appVaultReference%3Ddemo\u0026matcher=backup_name%3Dalpine-w2qjpw\u0026matcher=backup_uid%3D581f1d40-2d1e-4f67-ab15-2066e2b53010\u0026matcher=container%3Dkube-state-metrics\u0026matcher=creation_time%3D2025-10-28T17%3A41%3A09Z\u0026matcher=customresource_group%3Dprotect.trident.netapp.io\u0026matcher=customresource_kind%3DBackup\u0026matcher=customresource_version%3Dv1\u0026matcher=endpoint%3Dhttp\u0026matcher=error%3Dfailed+to+resolve+value+for+accountKey%3A+unable+to+get+secret+trident-protect%2Fpuneptunetest%3A+Secret+%22puneptunetest%22+not+found\u0026matcher=instance%3D172.18.0.222%3A8080\u0026matcher=job%3Dkube-state-metrics\u0026matcher=namespace%3Dprometheus\u0026matcher=pod%3Dtrident-protect-kube-state-metrics-c74f969cb-lp67z\u0026matcher=service%3Dtrident-protect-kube-state-metrics\u0026matcher=status%3DError\u0026orgId=1\n"
}

Conclusion and call to action

In this blog post, we have demonstrated how to effectively use Grafana's alerting capabilities to monitor the health and performance of your NetApp Trident protect deployments. By setting up alert rules for critical events such as appVault and backup failures, you can ensure timely notifications and take necessary actions to maintain the availability and reliability of your Kubernetes applications. The integration of Prometheus and Grafana provides a powerful monitoring and alerting solution that helps you stay ahead of potential issues and ensures smooth operations.

Now that you have the knowledge to set up and configure alerting for NetApp Trident protect, it's time to put it into practice! Start by following the steps outlined in this guide to create your own alert rules and notifications.

If you have any questions or need further assistance, don't hesitate to reach out to the NetApp community or consult the official documentation. Stay proactive in managing your Kubernetes applications and keep your data protected with NetApp Trident protect.

Public