The data your AI pipelines need is already there. Now you can find it!

DarF

NetApp’s Metadata Engine now gives storage teams – and the AI engineers depending on them – instant visibility across their data estate

If you've spent any time building or supporting AI infrastructure, you already know the real bottleneck isn't the GPU cluster or the models, it's data. Specifically, it's the inability for your organization to answer basic questions about your data at scale: what do we have, where is it, how old is it, and is it actually useful for this pipeline?

As of today, the NetApp Metadata Engine (MDE), the core metadata intelligence layer of the NetApp AI Data Engine (AIDE), is available as a software-only package. Deploy it on your preferred choice of recommended third-party servers today, and upgrade to the full AIDE platform with a simple software update when additional functionality becomes available.

The Problem: Unstructured Data Is Effectively Dark

In most enterprise environments, data is scattered across hybrid cloud deployments, on-premises NAS, and dozens of workload-specific volumes. It's expensive to query, rarely surfaced to the teams who need it, and almost impossible to act on at scale.

For storage teams, this means reactive firefighting: capacity alerts, tiering decisions made on gut feel, and access audits that take days. For data and AI engineers building AI pipelines, it means spending the majority of project time just finding and validating data before any real work begins. It also means fragile inferencing pipelines because nobody can confidently verify data provenance and recency.

The MDE solves all three problems from a single platform.

What the Metadata Engine Does

The MDE is a unified, global, live, searchable metadata repository that supports billions of files and objects across heterogeneous storage environments. Through the AI Data Engine Console, you get a centralized view of your entire data estate - no custom scripts, no batch jobs, no querying individual storage controllers.

Search and filtering let you query across your estate by file type, size, and age. Identifying, for example, all files over 50GB that haven't been accessed in 18 months - a query that previously required custom tooling - becomes a straightforward filter operation. Search queries can be saved for reuse and results exported to CSV, so recurring operational workflows get systematized instead of rebuilt from scratch each cycle.

Custom tagging APIs let any team enrich file system metadata with business context. A data engineering team can mark pipeline stage indicators. A business unit can tag files by project and cost center to determine chargeback and enable automated workflows. All of it lives alongside technical file metadata in one queryable repository. The MDE also supports ISV partner integrations for metadata enrichment and persona-specific or vertical-specific tools, making it a hub for third-party tools rather than another isolated silo.

Watch the demo video for a complete walkthrough of the console, filter interface, custom tag overview, and result export.

Value for Storage Engineers

The MDE gives storage teams data estate visibility and optimization capabilities that weren't previously achievable with native tooling:

Understand consumption patterns by user, application, and file type so capacity planning becomes data-driven instead of reactive
Identify tiering opportunities with precision by querying for data matching specific age, access recency, and size criteria, enabling intelligent policies that meaningfully reduce cold data costs on all-flash tiers
Detect security anomalies through metadata signals - bulk permission changes surface in near real-time without requiring a separate security analytics platform
Accelerate compliance responses from days to seconds using custom tags and targeted search

Enabling Data and AI Engineering Teams

The MDE isn't just a storage operations tool, it's infrastructure that directly unblocks other teams, and storage teams are in the position to deliver that value.

For data engineers, the MDE eliminates the most time-consuming phase of pipeline development: data discovery. Instead of filing requests with the storage team and waiting for manual scans, a data engineer can query the MDE directly, "show me all JSON files from the ingestion pipeline in the last 30 days" and get an exportable file list in minutes.

For AI engineers, the MDE addresses the metadata confidence problem at the root of many failed training pipelines. Training on stale, duplicate, or misclassified data isn't usually a curation failure it's a visibility failure. With a live, globally consistent metadata view and custom tag support, AI engineers can build pipelines that select data based on verified metadata rather than assumptions about what should be there.

Storage teams can enable all of this directly: set up Workspaces, apply custom tags via the API, and give data and AI teams access to the console or APIs. That's how storage organizations stop being passive ticket-takers and become active contributors to AI project success.

Deploy Now, Upgrade Later

MDE is available today on customer-chosen supported servers - no dedicated NetApp hardware required. The metadata foundation you build now (Workspaces, custom tags, ISV integrations) carries forward intact to the full AIDE platform.

The path forward is straightforward: deploy MDE today, start building metadata workflows, enable your data and AI teams with real data estate visibility, and unlock Data Guardrails and Data Curator (with vectorization) with a software update when additional AIDE functionality becomes available.

No hardware swap. No migration project. Just an accelerated path to a complete AI data pipeline platform starting now.

The data your AI pipelines need is already there. Now you can find it!

And the Legacy Continues! 🏆