Tech ONTAP Blogs

DocumentRAG Using OpenSearch: GraphRAG-like Structure Without the Graph Overhead

DavidvonThenen
NetApp
185 Views

Vector embeddings changed how teams build RAG systems. They made it easy to scan large datasets and pull back passages that feel semantically close to a question. And for a while, that was enough. You could drop your documents into an embedding model, compute vectors, plug everything into your favorite vector database, and call it a day. But as more organizations tried to use these systems for compliance, capturing business rules, and decision-making, the limits became clear. Vector search is driven by statistical closeness, not by whether the retrieved text actually answers the question. It's correlation over understanding, probability over precision. 

 

Additionally, vector similarity hides everything behind layers of GPU math. The ranking logic is opaque, even for engineers who work with it every day. When an auditor or domain expert asks why a specific passage was selected, you can't point to a keyword match or a phrase hit. You get a cosine score, a shrug, and a hope that everyone in the room likes linear algebra. This is where many teams run into trouble: they need transparency, not mystery. They need retrieval steps that can be explained, logged, and verified. Vector search, by design, resists all of that.

 

semantic-similarity.png

 

DocumentRAG offers another path. Instead of betting everything on dense vector similarity, it leans on BM25, phrase matches, explicit terms, and standard full-text search. This approach looks and behaves much closer to GraphRAG. You treat documents as structured knowledge instead of undifferentiated chunks. You anchor retrieval in clear signals that humans can inspect, reason about, and trust. And because the logic is plain, you gain a retrieval pipeline that supports explainability, audit trails, and governance requirements out of the box. In practice, it turns retrieval into something that behaves like a lightweight knowledge graph without the operational overhead of actually running one. 

 

This foundation is what makes DocumentRAG compelling, and that's what we'll explore in the following few sections of this blog post. 

 

Vector Embeddings Limitations

 

Vector embeddings promise semantic understanding, but in practice, they flatten your entire corpus into isolated points in a high-dimensional space. Every paragraph, sentence, or chunk becomes a standalone vector with no memory of where it came from or how it relates to the rest of the knowledge base. The model doesn't know that Section 3 depends on Section 1, or that a policy exception contradicts the default rule three paragraphs earlier. Everything is treated as independent. This is fine for simple lookups but falls apart the moment you need multi-hop reasoning, cross-document relationships, or a holistic view of how the data fits together. The Microsoft GraphRAG blog calls this out directly: vector search is great at fuzzy recall, but it's structurally blind to how concepts connect across a corpus. 

 

This lack of structure creates a much larger gap when an AI system needs to reason rather than retrieve. Because embeddings discard relationships, the model must infer all structure at generation time. It's trying to reconstruct a mental map of your system rules, business logic, and domain entities from vectors that never encoded those relationships in the first place. The result is fragile. It often works for simple tasks and fails quietly on complex ones. The video below drives the point home: when everything is flat, the retrieval engine can't follow multi-step paths or maintain context across documents. 

 

 

This is a well-known problem with vector embeddings, which also underlies how LLMs structure their internal data after training. Data scientists have been hard at work for over a year now, looking into other possible solutions to this data representation conundrum. One example of addressing this limitation is a research paper that is starting to explore the idea of multidimensional vector search. A high-level, beyond-simplistic explanation of this type of search is to consider multiple dimensions or a vector space to find the correct "match" or "fit" for a given vector or embedding. The theory is that this will yield better or more accurate answers to user prompts. 

 

research-paper.png

 

All of this leads to the core issue: observability and governance are nearly impossible to achieve. The embedding doesn't tell you which words matter or what features influence a match. It can't show you the chain of thought between query and result because there isn't one. Everything happens inside a GPU you can't easily inspect. When teams in regulated spaces ask for transparency, they discover that the vector store can't provide it. Then, when they try to validate retrievals for safety, bias, or correctness, they end up auditing cosine similarity rather than the underlying facts. That's the opposite of what reliable RAG systems need.

 

Why Are Implementations Turning To GraphRAG 

 

The reason why other RAG architectures, such as using a Graph over a Vector database, are starting to pick up significant amounts of steam is because of two main reasons: 

  1. Answers Need To Be Grounded In Facts 
  2. Explainability and Observability Must Do Better 

 

But how do these architectures win out in these categories over Vector-based RAG? These implementations address two fundamental concepts in the Retrieval portion of Retrieval Augmented Generation pipelines by providing flexibility at the cost of a user-provided implementation. 

 

First, by taking control of the implementation details for the Retrieval step, you decide which documents to retrieve from the data store or database that houses the information. In a GraphRAG implementation, we call this retrieving a knowledge graph or a relevant subset of your dataset. You can retrieve as little or as much of the subset of data as needed. It could be a few documents or an entire book. You decide the "map" to find that data. 

 

retrival-and-rerank.png

 

Second, you can rank the relevance of the paragraphs, chunks, documents, etc, retrieved to see which you want to deem most important in answering that question or ppl. If you had a crystal ball, a single document with the exact answer would be ideal, but in reality, it doesn't work that way. Luckily, in this architecture, you have control to help guide not only what is relevant, but also the importance of each item in your subset. 

 

Not only is externalizing the Retrieval process better for correctness, but it also has the added benefit of being faaaaaar more explainable and observable in GraphRAG and DocumentRAG, which turns out to be way better for AI governance and trust.

 

DocumentRAG Provides a GraphRAG-like Implementation

 

DocumentRAG (BM25-based) shifts retrieval from a probabilistic guess to an auditable process. Instead of relying on vector similarity to approximate meaning, BM25 surfaces documents based on explicit signals: keywords, phrases, entities, and field-level matches. That difference sounds small, but it changes how the entire system behaves. When retrieval is rooted in text you can point to, you gain the same structural benefits that make GraphRAG appealing without needing a full knowledge graph in the background. You're no longer treating your corpus as a cloud of vectors. You're treating it as a set of documents connected by shared entities and concepts. 

 

This is precisely what the DocumentRAG design emphasizes. Each document carries explicit terms extracted at ingest, stored as both keyword fields and text fields for lexical scoring. Those terms serve as anchors. When a user asks a question, the system extracts the same entities and uses them to drive a deterministic BM25 query. And now retrieval has shape. Instead of asking, "What's close in the vector space?" the system asks, "Which documents explicitly mention these entities, and how strongly?" It mirrors the spirit of GraphRAG's entity-driven retrieval, where structure comes from the domain rather than from embedding geometry. The result is a step toward knowledge graphs without the engineering overhead of building and maintaining one. If you are interested in learning more about GraphRAG, I would invite you to take a look at this blog post titled: From "Trust Me" to "Prove It": Why Enterprises Need GraphRAG, or take a look at a presentation at All Things Open AI from earlier this year.

 

 

The governance benefits follow naturally because BM25 and entity fields are transparent. You can explain exactly why a document was selected. You can show the clauses that matched, the fields that mattered, and the terms that drove the score. Tracing and logs tell reviewers exactly which portions of text satisfied the query. Provenance metadata tracks where the document came from, when it was ingested, and which version is being served. These capabilities align directly with the principles of GraphRAG; they support traceability, auditability, reproducibility, and human verification. Nothing happens inside a black box. Everything is observable. 

 

This reference design using OpenSearch also solves a problem that pure semantic search can't touch: trust. When answers originate from documents that clearly match user intent, stakeholders gain confidence in the system. When retrieval can be reproduced days later, teams can validate model behavior rather than guess why results changed. This is why BM25-based RAG feels so much like GraphRAG in practice. Both approaches recognize that AI systems need structure (entities, relationships, provenance, and constraints) to deliver reliable answers. GraphRAG builds that structure explicitly with nodes and edges. DocumentRAG builds a lighter version using entity fields, lexical scoring, and clear document boundaries. For many teams, that middle ground is enough to unlock the benefits of structured retrieval without committing to a full graph stack. 

 

Unlock Better Answers

 

DocumentRAG and GraphRAG share the same core philosophy: answers should come from structured, verifiable knowledge… not statistical guesses. They both ground retrieval and ranking in the actual data, and they both encourage systems to think in terms of entities, relationships, and provenance. GraphRAG takes this idea to its full expression by building a dedicated knowledge graph with explicit links between concepts. It's powerful, especially when the domain requires multi-hop reasoning or deep contextual understanding. But it also demands careful schema design, graph maintenance, and operational overhead that not every team is ready for.

 

new-knowledge-graph.png

 

DocumentRAG lands in a practical middle ground. There will be a follow-up blog post in the coming weeks where we will explore this DocumentRAG implementation using OpenSearch in more detail, but suffice it to say, this implementation intentionally doesn't use OpenSearch's built-in BM25 implementation nor does it use the "hybrid" BM25/vector search functionality. It sounds counterintuitive to implement an external mechanism when one is already provided, but you get a retrieval pipeline that behaves like a lightweight knowledge graph: clear boundaries, explainable results, and evidence you can show to auditors or stakeholders. The best part is that nothing lives in a black box. Every result can be traced, inspected, and reproduced. 

 

Don't know how to start, I've got you covered. Here are reference implementations, complete with code, that cover ingest to inference: 

For many organizations, that balance is exactly what they need. It offers a path toward trustworthy AI systems without committing to the full weight of a graph database on day one. As your requirements mature, the ideas behind DocumentRAG naturally carry forward into GraphRAG and other structured retrieval frameworks. It's a step toward more reliable, more transparent AI and one that teams can easily adopt today.

 

Next and last blog in this series tackles the differences between GraphRAG and DocumentRAG and we are going to discuss why we didn't use the native Hybrid Search functionality contained in OpenSearch. The TLDR on that story this implementation externalizes the Retrieval process so that it can be customized to the language domain and it's also explainable. Another way to describe this would be... suppose I didn't get the documents I thought I should have got, how would you push the ship in the direction to get the documents you need to answer the question. This can be done by expressing within a knowledge graph. However, if you are using vector embeddings or hybrid search, there isn't a straight forward way of doing that without involving data scientists to either manipulate the dataset by grooming data or synthetically adding data to act as a bridge.

Public