How to create a RAG-based AI application on FSx for ONTAP with BlueXP workload factory GenAI

MickeySh · ‎2024-07-10

To thrive in the generative artificial intelligence (GenAI) revolution, a fundamental capability that businesses must cultivate is agility.

NetApp has introduced BlueXP™ workload factory for AWS and its workload factory GenAI capability to help you seamlessly create managed retrieval-augmented generation (RAG)-based AI applications, such as chatbots. With RAG, you can personalize foundational models to derive knowledge from your company's structured and unstructured data sources, ensuring your context-aware AI applications are tailored precisely to your needs.

This step-by-step guide will walk you through an end-to-end example by showing you how to add context retrieval from your embedded Amazon FSx for NetApp ONTAP (FSx for ONTAP) data sources to an AI chatbot developed in LangChain, powered by AWS and workload factory GenAI capabilities.

Read on or jump down using the links here:

How BlueXP workload factory GenAI works

Overview of the tech stack

Step-by-step guide

Setting up access to your embedded knowledge base

Connecting to your LanceDB database

Building your AI chatbot

Query your AI chatbot

What’s next?

How BlueXP workload factory GenAI works

Workload factory empowers FSx for ONTAP users to deploy, optimize, automate, and operate AWS workloads through intuitive wizards, comprehensive dashboards, and ready-to-deploy infrastructure as code (IaC). For GenAI, creating a GenAI application powered by Amazon Bedrock is simple using workload factory.

From the workload factory GenAI page, you can create and manage your knowledge bases to set up RAG-based GenAI applications powered by Amazon Bedrock.

Workload factory GenAI goes beyond by enabling you to unlock the power of your FSx for ONTAP data sources for any GenAI solution design, with any open-source or proprietary tooling of your choice.

You can use workload factory GenAI to access and utilize your internal data on FSx for ONTAP for intelligent and context-aware AI chatbot interactions with a RAG-based solution developed with LangChain and LanceDB. Let’s see how it’s done.

Overview of the tech stack

In this guide, you’ll set up a RAG-based GenAI application that uses:

Workload factory to create a knowledge base of your proprietary data hosted in FSx for ONTAP, embedded with advanced Amazon Bedrock GenAI models.
The open-source LanceDB for the vector database.
LangChain for AI application development.

Let’s briefly introduce each technology before diving in.

FSx for ONTAP

FSx for ONTAP is a fully managed shared storage solution that combines NetApp® ONTAP® software with AWS. For the GenAI use case, FSx for ONTAP supports both enterprise and vector DB data.

BlueXP workload factory

BlueXP Workload factory helps to deploy, automate, optimize, and manage AWS workloads on FSx for ONTAP. Workload factory simplifies the deployment and management of AWS resources following industry best practices by design.

BlueXP workload factory GenAI

The GenAI capability in BlueXP workload factory provides an intuitive, low-code interface to set up and manage RAG knowledge bases. These knowledge bases help to securely connect private data on FSx for ONTAP to models delivered via Amazon Bedrock. These knowledge bases can then be used by GenAI applications, such as chatbots, to derive intelligence from source data in the knowledge base.

AI models and applications that use your organization’s private data efficiently and securely. To learn more, read our BlueXP workload factory GenAI blog.

LanceDB

LanceDB is an open-source vector database which stores and retrieves high-dimensional vectors for multimodal AI. It is designed for applications that require fast, scalable similarity search, making it an excellent choice for AI-driven applications such as chatbots that use large language models (LLM).

Workload factory uses LanceDB to store the embedding vectors and associated metadata (such as permissions), generated from embedding the FSx for ONTAP data sources of your knowledge bases, in an FSx for ONTAP volume.

Amazon Bedrock

Amazon Bedrock is a managed service from AWS for building scalable GenAI applications. It provides access to various well-known foundational models—from companies such as Mistral AI, AWS, AI21 Labs, and others—through a single API, simplifying the process of integrating advanced AI capabilities into your end-user applications.

LangChain

The LangChain framework makes it possible to develop applications that use LLMs in a vendor-agnostic manner. It offers tools and abstractions to seamlessly connect LLMs with various data sources through a unified interface, simplifying the development of complex AI applications, including chatbots, that are future-proof.

Step-by-step guide

In this guide, you build a chatbot that integrates with Amazon Bedrock’s GenAI models through a LangChain chain. Amazon Bedrock handles question-answering using retrieved documents from LanceDB, which have been generated by the BlueXP workload factory knowledge bases vector and synced from the FSx for ONTAP filesystem.

There are three steps involved in the process:

Setting up access to your embedded FSx for ONTAP knowledge base.

RAG involves creating a knowledge base that integrates your proprietary data from FSx for ONTAP with advanced AI models. The setup includes defining the embedding and chat models, configuring which data sources to retrieve, and initiating the embedding process for the defined knowledge base. The embedding vectors for your workload factory knowledge base(s) are stored in the LanceDB instance associated with your workload factory environment.,

Connecting to your LanceDB database.

LanceDB is a powerful vector search database that stores and indexes your data for efficient retrieval. To make the vector embeddings accessible to the application, you set up an authenticated connection to the LanceDB instance associated with your workload factory environment, and access the table associated with your chosen knowledge base via the knowledge base id.

The data in the knowledge base is permissions-aware, meaning that users who query the data will not get access to data they don’t have permissions to access. This is explained in more detail below.

Building your AI Chatbot.

RAG combines retrieval and generation models to provide accurate answers with context relevance. You can use LangChain to set up dialogue management, integrate with the RAG framework, and configure interactions to deliver context-aware responses based on the enriched knowledge base.

Permission to use each data item is strictly based on the chat user’s permissions level.

Then, you can use streamlit to create the AI chatbot interfaceーyou can find an end-to-end example for the demo chatbot application in our git examples repo.

Here is an example of how the chatbot interface looks like, using information from the selected knowledge base:

Setting up access to your embedded knowledge base

A knowledge base in workload factory is defined by a name, description, embedding model, chat model, and optional conversation starters. When you create a knowledge base—such as the Default knowledge base in the screenshot below—you can customize all these values to match your AI application requirements.

On its own, a knowledge base is just a container. You can click on the “Add data source” button to open the wizard that guides in loading your proprietary data in the knowledge base.

FSx for ONTAP organizes your data in a hierarchical structure: file system, volume, and folder. You can select one or more of your existing FSx for ONTAP file systems, volumes, and folders to create as your data source.

Follow the step-by-step guide on how to carry out this process in our post How to deploy and manage RAG knowledge base on FSx for ONTAP with BlueXP workload factory GenAI.

Take note of the knowledge base id, which you can find from in the “Manage Knowledge Base” screen in the BlueXP workload factory, as you’ll need it in the next steps.

Connecting to your LanceDB database

With workload factory set up, it is now time to move to your preferred Python host or Docker container to create the client application.

You can connect to the LanceDB database associated with your workload factory environment to access the embedded knowledge base, and create the RAG process necessary to retrieve the most similar embedding vectors to the embedded user prompt for response generation.

Install the required libraries:

pip install lancedb langchain_community

Connect to the LanceDB instance hosting your embedded FSx for ONTAP knowledge bases:

import lancedb

# Replace 'your_lancedb_host' with your LanceDB host
host = 'your_lancedb_host'

db = lancedb.connect(host)

LanceDB works as a local filesystem, so the host is the path to the FSx for ONTAP volume mount point and the knowledge base is a directory (i.e., a LanceDB table) within that path, mounted via NFS.

Open the knowledge base table.

Instead of creating and inserting embeddings into the LanceDB instance from scratch, you can just open the table in LanceDB associated with your knowledge base.

# Replace 'your_knowledge_base' with your actual workload factory knowledge base ID. You can find it in the “Manage Knowledge Base” screen in the BlueXP workload factory GenAI.
knowledge_base = 'your_knowledge_base'

table = db.open_table(knowledge_base)

The knowledge base id is the unique identifier for the LanceDB table.

Initialize the Amazon Bedrock embedding model.

The embedding model used by LanceDB needs to match the one used for the knowledge base to ensure that the prompt can be embedded in the same vector space as the original knowledge base for meaningful retrieval. You can extract this information from the LanceDB table metadata.

from langchain_community.embeddings import BedrockEmbeddings

# Extract a small number of rows from the knowledge base table to recover the embedding model used by workload factory
dimensions = table.schema.field("vector").type.list_size
df = table.search(np.random.random((dimensions))).limit(10).to_list()
bedrock_embedding_model_id = df[0].get("embedding_model")

# Define relevant model kwargs for the extracted embedding model
model_kwargs = {}
if bedrock_embedding_model_id == "amazon.titan-embed-text-v1":
    model_kwargs = {}
elif bedrock_embedding_model_id == "amazon.titan-embed-text-v2:0":
    model_kwargs = {"dimensions": dimensions}
else:
    print("Invalid bedrock_embeddings")

# Initialize an instance of the embedding model
bedrock_embeddings = BedrockEmbeddings(
    model_id=bedrock_embedding_model_id,
    client=self.bedrock_client, 
    model_kwargs=model_kwargs,
)

Here is the LanceDB schema for reference:

export interface EmbeddingMetadata {
    datasource_id: string;
    chunk_id: number;
    inode: number;
    full_path: string;
    acl: string[];
    embedding_model: string;
}
export interface Embedding extends EmbeddingMetadata {
    id: string;
    vector: number[];
    document: string;
}

Create an instance of the LanceDB vector store.

Create a LanceDB object given the database URI, region, embedding model, text key, and table name.

from langchain_community.vectorstores import LanceDB

vector_store = LanceDB(
    uri=db.uri,
    region=self.region,
    embedding=bedrock_embeddings,
    text_key='document',
    table_name=knowledge_base,
)

Create a retriever object from the LanceDB vector store.

First, a word about how workload factory handles permissions. Generally the chatbot uses the metadata information that was extracted during the embedding process in workload factory. It adds the access permissions security identifier (SID) and group identifier (GID) to each embedded document taken from the original file. You can use this information to filter out data by interest based on the accessor SID/GID. In the sample application, you need to provide the accessor SID in the metadata filter field. Otherwise it will default to providing access to everyone.

If you’re working at an enterprise, you can connect your application to an identity provider that is synced with your organization’s Active Directory to get this information based on authentication and not use it as a manual filter. Workload factory does this for you if you use it to build your chat.

If you have set up vector metadata and you know that the user is only interested in specific metadata values, you should filter the search results before creating the retriever for faster and more efficient execution.

In this case, we filter by the Access Control List (ACL) information associated with the original FSx for ONTAP data source file.

# Replace 'your_metadata' with your metadata filters
knowledge_base = 'your_metadata'

if metadata == "":
    retriever = vector_store.as_retriever()
else:
    sql_filter = f"array_has(acl,'{metadata}') OR array_has(acl,'*:ALLOWED')"
    retriever = vector_store.as_retriever(
        search_kwargs={"filter": {
            'sql_filter': sql_filter,
            'prefilter': True
        }
    })

The retriever object can now be used by the AI chatbot to perform searches and retrieve relevant data based on vector similarity with RAG.

Building your AI chatbot

What happens when a user sends a message to the chatbot interface?

This section walks you through how to build your AI chatbot so that it returns an optimal context-aware response for the user request.

Install the required libraries:

pip install boto3 langchain langchain_core

Create an instance of the chat model. Choose one of the models hosted by Amazon Bedrock as the engine for your AI chatbot.

import boto3
from langchain_community.chat_models import BedrockChat

# Replace the following values with those of interest for your AWS Bedrock model
model_id = 'your_model_id'
model_kwargs = {'temperature': <your-temperature>, …}
region = 'your_region'

bedrock_client = boto3.client(service_name='bedrock-runtime', region_name=region)

llm = BedrockChat(
    model_id=model_id,
    model_kwargs=model_kwargs,
    streaming=True,
    client=bedrock_client,
)

Enhance the prompt. The user prompt needs to be extended to provide historical information, and ideally a system prompt that the model can use to initialize its behavior, using ChatPromptTemplate.

This step helps the model better understand the user query and the context in which it is being asked.

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Replace 'your_prompt' with the user prompt
prompt = 'your_prompt'

# Contextualize the question
contextualize_q_system_prompt = (
    "Formulate your response using the latest user question and the "
    "chat history, but make sure that the answer can be understood "
    "without knowledge of the chat history. "
    "The question may or may not reference context in the chat history. "
    "Do NOT answer the question if you don't have enough context for it."
)
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
    ]
)

# Answer the question
system_prompt = (
    "You are an assistant specialized in question answering. "
    "Use the following context to formulate your answer."
    "If you don't have enough information to answer, reply with "
    "‘I do not have enough context. Please provide more details.’. "
    "Your answers should be short and to the point, with at "
    "most three sentences used."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt), MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

Create the retrieval chain. Establish a retrieval chain to integrate the chat model with data retrieval and contextual understanding.

Note that chat history is for the current session only. If you want to persist the context beyond this session, consider saving it in an external store like DynamoDB.

from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables.history import RunnableWithMessageHistory

# Replace 'your_session_id' with the unique session id
session_id = 'your_session_id'

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt,
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

Generate the context-aware response. Use the conversational chain to extract the source documents with the RAG and request the model to generate a relevant context-aware response based on the enriched prompt.

Note that, if you had previously set a permission filter for the LanceDB retriever, the response is only based on embedded documents accessible to the user.

def stream_chain(self, chain,prompt):
        
    response = chain.stream(
        {"input": prompt},
        config={"configurable": {"session_id": self.session_id}},
    )
    self.doc_url = []
    for chunk in response:
        for key in chunk:
            if key == 'answer':
                yield(chunk[key])
            if key == 'context':
                self.doc_url.append(chunk[key][0].metadata['full_path'])
        return response

response = stream_chain(conversational_rag_chain, prompt)

By following these steps, you've effectively configured your AI chatbot to handle user queries with enhanced context awareness from your FSx for ONTAP data. Leveraging Amazon Bedrock models and integrating advanced data retrieval and response generation mechanisms ensures your chatbot delivers accurate and relevant answers. This setup not only enhances user interaction but also provides a seamless experience in navigating and retrieving information from your knowledge base.

Query your AI chatbot

Below you will see a few examples showing how to query the Amazon Bedrock chatbot using information in the knowledge base.

Here in the left panel, you can see a knowledge base has been selected in the UI through the ID provided by workload factory:

Asking a question that can be answered with information in the knowledge base on FSx for ONTAP data will also provide the source of the answer:

Asking questions about data that is not included in the knowledge base on FSx for ONTAP data will produce a reply from the chatbot that it doesn’t have the information to answer: