Tech ONTAP Blogs
Tech ONTAP Blogs
In the ever-evolving landscape of artificial intelligence and machine learning (AI and ML), the adoption of vector databases has emerged as foundational for enhancing the capabilities and performance of retrieval-augmented generation (RAG) systems. These specialized databases are designed to efficiently store, search, and manage vector embeddings, which are high-dimensional representations of data, enabling fast retrieval of relevant information that significantly boosts the intelligence and responsiveness of RAG-based architectures.
Using vector databases in RAG is not merely a technical enhancement; it’s a paradigm shift. By enabling more nuanced and contextually aware retrievals, vector databases empower applications to generate responses that are grounded in the semantic meaning of the data. This leap in relevance is crucial for a wide range of applications, from natural language processing and conversational AI to personalized recommendations and beyond. And it marks a pivotal moment in our journey toward creating more intelligent, efficient, and human-centric AI systems.
In this blog post, we delve into the I/O characteristics of vector databases. Understanding these characteristics is pivotal for effectively using vector databases in RAG deployments, because they directly affect the performance, scalability, and efficiency of these systems.
Table of Contents
This section describes the lab setup for our study.
The testbed includes a NetAp® AFF A800 HA-pair running ONTAP® 9.14.1, a Fujitsu PRIMERGY RX2540-M4 running Ubuntu 22.04, and connections between host and storage going through a Cisco switch using 100GbE connections.
The system setup included the NetApp system with four 100GbE connections to a Cisco switch, and the Fujitsu host connected via a single 100GbE link. For performance optimization relative to the single host, 48 NetApp FlexVol® volumes were configured, each with one LUN, all mapped to the host by using the iSCSI protocol.
On the host, the /etc/iscsi/iscsid.conf file was modified to increase the number of iSCSI sessions from one to four, and multipathd was enabled. A volume group was then established using these 48 LUNs, and a striped logical volume was created to support the XFS file system.
This section outlines the configuration of the software stack that we used during our performance measurements.
VectorDB-Bench is a vector database benchmark tool designed for user-friendliness. It enables anyone to easily replicate tests or evaluate new systems, simplifying the selection process among numerous cloud and open-source providers.
VectorDB-Bench tests mimic real-world conditions, including data insertion and various search functions, using public datasets from actual production environments like SIFT, GIST, Cohere, and one generated by OpenAI.
Milvus is a database that is engineered specifically for storing, indexing, and managing the vast amounts of embedding vectors generated by deep neural networks and other machine learning models. Designed to operate on a scale of billions of vectors, Milvus excels in handling embedding vectors derived from unstructured data, a task that traditional relational databases, which focus on structured data, cannot perform.
With the rise in the volume of unstructured data, such as emails, social media content, and IoT sensor data, Milvus can store this data in the form of vector embeddings. This ability allows it to measure the similarity between vectors and, by extension, the similarity of the data source they originate from.
Pgvecto.rs is a PostgreSQL extension that enhances the relational database with vector similarity search capabilities. It is developed in Rust and builds on the framework provided by pgrx.
Hierarchical Navigable Small World index
The Hierarchical Navigable Small World (HNSW) index is a type of data structure used in vector databases for efficient search of high-dimensional data. It’s particularly good at finding the nearest neighbors in this kind of data, which is a common requirement for many machine learning applications, such as recommendation systems and similarity searches. How does it work?
Imagine that you’re at a large party and you need to find a group of people who share your interests out of hundreds of guests. Walking up to each person to find out if they’re a match would take a long time. Instead, HNSW organizes people into groups based on how similar they are to each other, creating layers of these groups from very broad to very specific. When you start your search, you first interact with the broad groups, which quickly guide you to increasingly specific groups until you find your best matches without having to meet everyone at the party.
Disk-Approximate Nearest Neighbor index
The Disk-Approximate Nearest Neighbor (DiskANN) index is a type of indexing mechanism designed to efficiently perform nearest neighbor searches on very large datasets that don’t fit entirely into the main memory of a host, but rather need to be stored on disk. How does it work?
Suppose that you have a huge library of books, far more than could fit on a single self or even in an entire room. You need a system to find the most relevant book based on a topic you’re interested in. However, space constraints mean that you can’t possibly have all the books laid out in front of you at once, so you need a smart way to store and retrieve them. DiskANN creates an efficient pathway to retrieve the most relevant books (or data points) from your storage (the disk), even though they’re not all immediately accessible in your main memory. It optimizes the layout of data on the disk and intelligently caches parts of the data to minimize the disk access times, which are typically the bottleneck in such large-scale systems.
HNSW versus DiskANN
In summary, HNSW is highly efficient for datasets that can fit within the server’s cache (RAM), leveraging fast memory access to speed up the search for nearest neighbors in high-dimensional space. However, its effectiveness is bounded by the amount of RAM available, which can limit its use in extremely large datasets.
On the other hand, DiskANN is designed to handle situations where the dataset is too large to fit into RAM. It uses clever strategies to minimize the performance penalties of having to fetch data from slower disk storage, thereby extending the potential size of the dataset to the limits of disk capacity. This makes DiskANN suitable for massive datasets, trading off some speed for the ability to handle larger amounts of data.
We started our setup by deploying a Milvus standalone instance using a shell script, available at https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh. The script spins up a set of three containers, which constitute the Milvus database service.
Next, we measured the performance of the Milvus database instance using two datasets. The OpenAI dataset contains 5 million vectors, each with 1,536 dimensions using the DiskANN index. The LAION dataset contains 10 million vectors, each with 768 dimensions using the HNSW. The LAION dataset was used in the comparison of Milvus versus pgvecto.rs.
The measurement using the DiskANN index focused on understanding the I/O characteristics of this type of index. The measurement using the HNSW focused on checking whether there would be any I/O at all, since it’s an in-memory index, and it was used for the performance comparison between Milvus and pgvecto.rs.
To capture the I/O characteristics of the database during the vectordb-bench process, we recorded the start and end dates and times for each run and generated an ONTAP performance archive corresponding to the measurement periods.
When the Milvus measurements were completed, we switched the database to PostgreSQL running with pgvecto.rs 0.2.0.
About the index type we used in our measurements: For Milvus, which supports HNSW and DiskANN, we collected measurements with both indexes. At the time of that we measured performance, pgvecto.rs didn’t have support for DiskANN, so we collected measurements with HNSW.
First, let’s examine the performance of Milvus and Pgvecto.rs using the HNSW index. Pgvecto.rs delivered 1,068 queries per second (QPS) with a recall rate of 0.6344, whereas Milvus managed 106 QPS but achieved a higher recall of 0.9842. In terms of the 99th percentile latency, Milvus demonstrated marginally better latency compared to Pgvecto.rs.
From the perspective of storage, there was no disk I/O, which aligns with expectations, because the index is memory-based and was completely loaded into RAM.
When precision in query results is important, according to the benchmark results, Milvus is superior to Pgvecto.rs because it retrieves a higher proportion of relevant items for each query.
When query throughput is the priority, Pgvecto.rs outperforms Milvus in terms of QPS. However, it’s important to note that the relevance of the retrieved data is compromised, because 37% of the results are not pertinent to the specified query.
Let’s now examine Milvus using the DiskANN index. Milvus reached 10.93 QPS with a recall rate of 0.9987 and a 99th percentile latency of 708.2 milliseconds. Notably, the host CPU, operating at full capacity throughout, was the primary bottleneck.
From a storage point of view, the data ingestion and post-insert optimization phase primarily involved a mix of read and write operations, predominantly writes, with an average I/O size of 64KB. During the query phase, the workload consisted entirely of random read operations, with an average I/O size of 8KB.
In reviewing the index implementations for vector databases, HNSW emerges as the predominant type, largely due to its established presence. DiskANN, being a newer technology, is not yet as universally adopted. However, as generative AI applications expand and the associated data grows, more developers are integrating DiskANN options into vector databases.
DiskANN is increasingly important for managing large, high-dimensional datasets that exceed RAM capacities, and it is gaining traction in the market. Its disk I/O profile is well suited for modern flash-based storage systems, like NetApp AFF A-Series and C-Series, ensuring that it handles large data volumes efficiently.
[1] VectorDB Benchmark. https://github.com/zilliztech/VectorDBBench
[2] Milvus Vector Database. https://milvus.io/docs
[3] Postgres pgvecto.rs Database. https://docs.pgvecto.rs/getting-started/overview.html
NetApp style is to keep sentences to about 35 words, so I broke this one into two sentences. [WJ1]
I made Index types a second-level head, like Vector databases. OK? [WJ2]
I made this a third-level head, as in the previous section. OK? [WJ3]
I added this third-level head. OK? [WJ4]