Tech ONTAP Blogs

What Would Make You Trust an AI Agent with Your Data Infrastructure?

Rosa_NetApp
NetApp
152 Views

AI agents are moving from answering questions to taking action. In data infrastructure, that could look like an agent diagnosing an incident, proposing a safe change, tuning performance, tightening access policies, or optimizing cost.

 

That’s exciting—and it’s also where the stakes get real.

 

In data infrastructure, “oops” can mean downtime, cost spikes, accidental deletion, silent data quality issues, or a compliance event. So rather than assuming trust is automatic, I’m starting with a simpler premise:

 

Trust isn’t a vibe. It’s designed—through visibility, control, and guardrails.

 

This post is an open invitation: what do you need to feel trust in your interactions with agents in data infrastructure products? If you’re willing, I’d love your perspective in the comments—especially concrete scenarios and “must-have” safeguards.

 

Where agents could help

 

Across data infrastructure, agents could assist with jobs like:

  • Incident response: detect anomalies, summarize what changed, propose remediation steps, run a playbook
  • Performance tuning: query optimization suggestions, index recommendations, configuration changes
  • Cost optimization: right-sizing, storage tiering, idle resource cleanup, scheduling workloads
  • Governance & security: least-privilege role recommendations, risky access flagging, policy drift detection
  • Data reliability: pipeline failure diagnosis, rerun plans, dependency/lineage summaries
  • Operational hygiene: backlog grooming, alert triage, documentation updates, runbook drafting

But the trust question changes depending on what “assist” means. There’s a big difference between:

  • “Recommend what I should do” and
  • “Do it for me in production.”

 

A quick model: levels of autonomy

 

When you imagine agents in your environment, which level of autonomy would you actually allow today?

 

The Delegation Ladder (from lowest risk to highest)

  1. Explain — answer questions, summarize state (e.g., “What changed?”)
  2. Recommend — suggest actions (e.g., “Here are 3 safe options…”)
  3. Draft — generate a plan/script/config diff for review
  4. Execute with approval — run only after explicit confirmation
  5. Execute & notify — run and report (with easy rollback)
  6. Autonomous within guardrails — acts automatically inside strict boundaries

Comment prompt: If you pick a number, tell me why. What’s the risk threshold behind your choice?

 

What “trust” might require 

 

Below are trust ingredients. I’d love your take: which are non-negotiable, which are “nice-to-have,” and which don’t matter to you?

 

1) Visibility: “Show me what you saw”

Before you trust the action, you may need to trust the diagnosis.

  • What evidence did the agent use? (telemetry, logs, metrics, config history)
  • What changed recently?
  • What signals are most important in your world? (latency, error rate, replication lag, storage growth, SLA breaches)

Question: What would you want an agent to show you so you can say, “Yes—this is the real issue”?

 

2) Plan preview: “Let me review the steps”

In many environments, trust starts with a clear plan.

  • Step-by-step plan in plain language
  • A config diff or script preview
  • Dependencies and prerequisites
  • Potential side effects

Question: What is the minimum plan detail you require before approving an action?

 

3) Guardrails & constraints: “Don’t exceed my boundaries”

Many teams are open to agent help if it’s bounded:

  • Scope constraints (only this cluster, this namespace, this dataset)
  • Time constraints (only during change windows)
  • Cost constraints (never exceed $X/day; alert if projection changes)
  • Policy constraints (data residency, retention, encryption requirements)

Question: What guardrails would you insist on before delegating anything beyond recommendations?

 

4) Permissions alignment: “Never escalate privileges”

In infrastructure, the agent must behave like a disciplined operator:

  • Honors RBAC/ABAC exactly
  • Never “works around” controls
  • Makes permission needs explicit (“I can’t do X without Y permission”)

Question: How should an agent handle situations where it could fix something but lacks permission?

 

5) Auditability & traceability: “If it acts, it must leave a trail”

For many teams, trust is inseparable from accountability:

  • Who/what initiated the action?
  • What was changed, when, and where?
  • What inputs were used?
  • What was the outcome?
  • Can support/ops reconstruct the chain of events quickly?

Question: What would your ideal audit log capture to be truly useful?

 

6) Reversibility: “Make it easy to undo”

Even great operators make mistakes. Trust often hinges on:

  • Rollback plans baked into execution
  • Snapshots/backups before risky operations
  • Safe fallback behavior
  • Clear “undo” affordances

Question: Which matters more in your environment: preventing mistakes or recovering quickly from mistakes—and why?

 

7) Honest uncertainty: “Don’t pretend to be sure”

Overconfidence is a trust killer. Many teams prefer an agent that says:

  • “Here’s what I know”
  • “Here’s what I’m missing”
  • “Here are options with tradeoffs”
  • “Here’s the safest next step”

Question: How should an agent communicate uncertainty without slowing you down during critical moments?

 

Over to you: What do you need to feel trust when an agent is involved?

 

As always, your feedback and input is always valued.  I appreciate you taking the time to chime in.

-Rosa

Public