For organizations that are invested in cloud and hybrid solutions, AWS re:Invent is one of the most important tech conferences to close out the year. NetApp is excited to participate and to share a few of the new solutions that our partnership with Amazon is bringing to market.
... View more
In the ever-evolving landscape of artificial intelligence and machine learning (AI and ML), the adoption of vector databases has emerged as foundational for enhancing the capabilities and performance of retrieval-augmented generation (RAG) systems. These specialized databases are designed to efficiently store, search, and manage vector embeddings, which are high-dimensional representations of data, enabling fast retrieval of relevant information that significantly boosts the intelligence and responsiveness of RAG-based architectures.
Using vector databases in RAG is not merely a technical enhancement; it’s a paradigm shift. By enabling more nuanced and contextually aware retrievals, vector databases empower applications to generate responses that are grounded in the semantic meaning of the data. This leap in relevance is crucial for a wide range of applications, from natural language processing and conversational AI to personalized recommendations and beyond. And it marks a pivotal moment in our journey toward creating more intelligent, efficient, and human-centric AI systems.
In this blog post, we delve into the I/O characteristics of vector databases. Understanding these characteristics is pivotal for effectively using vector databases in RAG deployments, because they directly affect the performance, scalability, and efficiency of these systems.
Table of Contents
Lab Setup. 2
Infrastructure. 2
Software. 2
Benchmark: VectorDB-Bench. 2
Vector Database. 2
Methodology. 3
Results & Lessons Learned. 4
Results. 4
Lessons Learned. 5
References. 6
Lab setup
This section describes the lab setup for our study.
Infrastructure
The testbed includes a NetAp® AFF A800 HA-pair running ONTAP® 9.14.1, a Fujitsu PRIMERGY RX2540-M4 running Ubuntu 22.04, and connections between host and storage going through a Cisco switch using 100GbE connections.
The system setup included the NetApp system with four 100GbE connections to a Cisco switch, and the Fujitsu host connected via a single 100GbE link. For performance optimization relative to the single host, 48 NetApp FlexVol® volumes were configured, each with one LUN, all mapped to the host by using the iSCSI protocol.
On the host, the /etc/iscsi/iscsid.conf file was modified to increase the number of iSCSI sessions from one to four, and multipathd was enabled. A volume group was then established using these 48 LUNs, and a striped logical volume was created to support the XFS file system.
Software
This section outlines the configuration of the software stack that we used during our performance measurements.
Benchmark: VectorDB-Bench
VectorDB-Bench is a vector database benchmark tool designed for user-friendliness. It enables anyone to easily replicate tests or evaluate new systems, simplifying the selection process among numerous cloud and open-source providers.
VectorDB-Bench tests mimic real-world conditions, including data insertion and various search functions, using public datasets from actual production environments like SIFT, GIST, Cohere, and one generated by OpenAI.
Vector databases
Milvus
Milvus is a database that is engineered specifically for storing, indexing, and managing the vast amounts of embedding vectors generated by deep neural networks and other machine learning models. Designed to operate on a scale of billions of vectors, Milvus excels in handling embedding vectors derived from unstructured data, a task that traditional relational databases, which focus on structured data, cannot perform.
With the rise in the volume of unstructured data, such as emails, social media content, and IoT sensor data, Milvus can store this data in the form of vector embeddings. This ability allows it to measure the similarity between vectors and, by extension, the similarity of the data source they originate from.
PostgreSQL pgvecto.rs extension
Pgvecto.rs is a PostgreSQL extension that enhances the relational database with vector similarity search capabilities. It is developed in Rust and builds on the framework provided by pgrx.
Index types
Hierarchical Navigable Small World index
The Hierarchical Navigable Small World (HNSW) index is a type of data structure used in vector databases for efficient search of high-dimensional data. It’s particularly good at finding the nearest neighbors in this kind of data, which is a common requirement for many machine learning applications, such as recommendation systems and similarity searches. How does it work?
Imagine that you’re at a large party and you need to find a group of people who share your interests out of hundreds of guests. Walking up to each person to find out if they’re a match would take a long time. Instead, HNSW organizes people into groups based on how similar they are to each other, creating layers of these groups from very broad to very specific. When you start your search, you first interact with the broad groups, which quickly guide you to increasingly specific groups until you find your best matches without having to meet everyone at the party.
Disk-Approximate Nearest Neighbor index
The Disk-Approximate Nearest Neighbor (DiskANN) index is a type of indexing mechanism designed to efficiently perform nearest neighbor searches on very large datasets that don’t fit entirely into the main memory of a host, but rather need to be stored on disk. How does it work?
Suppose that you have a huge library of books, far more than could fit on a single self or even in an entire room. You need a system to find the most relevant book based on a topic you’re interested in. However, space constraints mean that you can’t possibly have all the books laid out in front of you at once, so you need a smart way to store and retrieve them. DiskANN creates an efficient pathway to retrieve the most relevant books (or data points) from your storage (the disk), even though they’re not all immediately accessible in your main memory. It optimizes the layout of data on the disk and intelligently caches parts of the data to minimize the disk access times, which are typically the bottleneck in such large-scale systems.
HNSW versus DiskANN
In summary, HNSW is highly efficient for datasets that can fit within the server’s cache (RAM), leveraging fast memory access to speed up the search for nearest neighbors in high-dimensional space. However, its effectiveness is bounded by the amount of RAM available, which can limit its use in extremely large datasets.
On the other hand, DiskANN is designed to handle situations where the dataset is too large to fit into RAM. It uses clever strategies to minimize the performance penalties of having to fetch data from slower disk storage, thereby extending the potential size of the dataset to the limits of disk capacity. This makes DiskANN suitable for massive datasets, trading off some speed for the ability to handle larger amounts of data.
Methodology
We started our setup by deploying a Milvus standalone instance using a shell script, available at https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh. The script spins up a set of three containers, which constitute the Milvus database service.
Next, we measured the performance of the Milvus database instance using two datasets. The OpenAI dataset contains 5 million vectors, each with 1,536 dimensions using the DiskANN index. The LAION dataset contains 10 million vectors, each with 768 dimensions using the HNSW. The LAION dataset was used in the comparison of Milvus versus pgvecto.rs.
The measurement using the DiskANN index focused on understanding the I/O characteristics of this type of index. The measurement using the HNSW focused on checking whether there would be any I/O at all, since it’s an in-memory index, and it was used for the performance comparison between Milvus and pgvecto.rs.
To capture the I/O characteristics of the database during the vectordb-bench process, we recorded the start and end dates and times for each run and generated an ONTAP performance archive corresponding to the measurement periods.
When the Milvus measurements were completed, we switched the database to PostgreSQL running with pgvecto.rs 0.2.0.
About the index type we used in our measurements: For Milvus, which supports HNSW and DiskANN, we collected measurements with both indexes. At the time of that we measured performance, pgvecto.rs didn’t have support for DiskANN, so we collected measurements with HNSW.
Results and lessons learned
Results
First, let’s examine the performance of Milvus and Pgvecto.rs using the HNSW index. Pgvecto.rs delivered 1,068 queries per second (QPS) with a recall rate of 0.6344, whereas Milvus managed 106 QPS but achieved a higher recall of 0.9842. In terms of the 99 th percentile latency, Milvus demonstrated marginally better latency compared to Pgvecto.rs.
From the perspective of storage, there was no disk I/O, which aligns with expectations, because the index is memory-based and was completely loaded into RAM.
When precision in query results is important, according to the benchmark results, Milvus is superior to Pgvecto.rs because it retrieves a higher proportion of relevant items for each query.
When query throughput is the priority, Pgvecto.rs outperforms Milvus in terms of QPS. However, it’s important to note that the relevance of the retrieved data is compromised, because 37% of the results are not pertinent to the specified query.
Let’s now examine Milvus using the DiskANN index. Milvus reached 10.93 QPS with a recall rate of 0.9987 and a 99 th percentile latency of 708.2 milliseconds. Notably, the host CPU, operating at full capacity throughout, was the primary bottleneck.
From a storage point of view, the data ingestion and post-insert optimization phase primarily involved a mix of read and write operations, predominantly writes, with an average I/O size of 64KB. During the query phase, the workload consisted entirely of random read operations, with an average I/O size of 8KB.
Lessons learned
In reviewing the index implementations for vector databases, HNSW emerges as the predominant type, largely due to its established presence. DiskANN, being a newer technology, is not yet as universally adopted. However, as generative AI applications expand and the associated data grows, more developers are integrating DiskANN options into vector databases.
DiskANN is increasingly important for managing large, high-dimensional datasets that exceed RAM capacities, and it is gaining traction in the market. Its disk I/O profile is well suited for modern flash-based storage systems, like NetApp AFF A-Series and C-Series, ensuring that it handles large data volumes efficiently.
References
[1] VectorDB Benchmark. https://github.com/zilliztech/VectorDBBench
[2] Milvus Vector Database. https://milvus.io/docs
[3] Postgres pgvecto.rs Database. https://docs.pgvecto.rs/getting-started/overview.html
NetApp style is to keep sentences to about 35 words, so I broke this one into two sentences. [WJ1]
I made Index types a second-level head, like Vector databases. OK? [WJ2]
I made this a third-level head, as in the previous section. OK? [WJ3]
I added this third-level head. OK? [WJ4]
... View more
ONTAP FlexGroup volumes offer incredible performance and massive capacity scalability. What if you have data today in a Flexible volume (FlexVol) but decide you want a FlexGroup volume? How best do you go about this change?
First, we should confirm that you should in fact consider the change. Starting with ONTAP 9.12.1, the maximum FlexVol size tripled from 100TB to 300TB. If your workload is well under 300TB, and you don’t expect it to grow to that level, a FlexVol might be the best place to stay. Every version of ONTAP delivers better single volume performance so simply upgrading to a more current version of ONTAP might be the best course of action.
However, what if you decide you do need to convert to a FlexGroup volume? What is the best process? Next, we need to determine the reason for the move as that will best determine the path forward.
Need greater capacity
Let’s say you have a large pool of cool data. The performance of a FlexVol is sufficient, but you need to grow that pool of data well beyond 300TB. In this case you could perform an in-place FlexVol to FlexGroup volume conversion.
An in-place conversion leaves all existing data in the original volume but expands the data container by adding additional member volumes creating a FlexGroup volume. It will not accelerate any of the original data but does allow for seamless expansion to multiple PB capacity with new data being optimally placed across the new member volumes.
It is important to note that after doing the conversion, you will have one member volume (the original volume) that is quite full and other volumes that are empty. This is normal and will eventually equalize over time.
Need greater performance
What if instead of needing greater capacity, you have a workload that now demands much greater performance. Perhaps it is exceeding what as single storage node can satisfy. In this case, a different approach is recommended.
For these cases, a new separate FlexGroup volume should be created with an optimal number of member volumes. ONTAP does this automatically, and this number will vary based on how large your data set is and the composition of your ONTAP cluster. Data in the original FlexVol should be copied to the new FlexGroup volume.
This is most easily done by mounting the new FlexGroup volume as a different mount point then copying the data using NetApp’s free XCP software. XCP can copy significant amounts of data and the new FlexGroup volume will place the data in an optimal layout across the member volumes.
A short cut over is required, after which the clients using the original FlexVol should unmount it, XCP can do a final copy pass, and then the FlexVol can be placed offline. The new FlexGroup can then be mounted to replace the FlexVol and the clients can remount the share.
Although more work, this leaves you with an optimally laid out FlexGroup volume, ready to deliver a huge increase in throughput and the ability to grow as needed.
There you go, if you need to move from a FlexVolume to a FlexGroup, you have 2 options depending on your needs and situation.
... View more