AI requires scalable, accessible, and efficient data management; but many enterprises struggle to manage data seamlessly across hybrid and multi-cloud environments. The latest integration of Domino Volumes for NetApp ONTAP (DVNO) provides a solution – enabling rapid access to data across environments without DevOps overhead and reducing costs and processing times by up to 50%.
How? Domino’s first-party integration with NetApp’s intelligent data infrastructure doubles read performance and GPU throughput over previous configurations. For resource-intensive AI use cases requiring distributed GPU training, like computer vision and LLM training/fine-tuning, Domino customers can now run GPUs for half as long.
Create DVNO Volumes from Domino
With Domino's new DVNO feature, users can create storage volumes powered by NetApp ONTAP and BlueXP. This allows data scientists to provision scalable storage volumes directly within the Domino interface without IT involvement or DevOps work. This capability is especially valuable for large enterprise data science teams, who need quick and reliable access to data without waiting for infrastructure provisioning. By simplifying the volume creation process, teams can reduce delays, allowing them to focus on experimenting and iterating faster.
Figures 1 and 2: Creating a Domino Volume for NetApp ONTAP (DVNO) from Domino’s platform
Collaborate and Control Access to DVNO Volumes from Domino
DVNO volumes can be shared directly with other users through Domino. Data scientists can share volumes across projects, enabling straightforward access to shared datasets. Sharing data in this manner is crucial for enterprise-scale collaboration, allowing different teams and stakeholders to access consistent, up-to-date datasets without duplication or manual data transfer. This not only improves collaboration but also reduces storage overhead and potential inconsistencies.
Figure 3: Data scientists have self-service access to attach shared data volumes to executions, accelerating iteration and innovation across the model lifecycle.
Monitor DVNO Volumes from Domino
DVNO provides straightforward access control, enabling IT administrators to monitor permissions and data usage effectively through secure, consistent management across all environments. Standard data access patterns for developers and API users ensure seamless access, so users can securely share, update, or restrict access to volumes, ensuring sensitive data remains protected.
For data science teams, this level of control is essential to maintain compliance and meet enterprise security requirements, while still allowing the flexibility needed to work efficiently. IT teams can ensure that only authorized users have access, minimizing the risk of data breaches.
Figure 4: Domino application admins can see a list of all DVNO volumes and metadata, such as size and who has access.
Enhance Data Organization with User and Project-based Storage Volumes
By empowering data scientists to self-manage ONTAP volumes, DVNO enables teams to create dedicated storage volumes tailored to specific users, projects, or workflows. This structure simplifies data organization and enhances data governance by isolating access to sensitive datasets.
For IT teams, the ability of data scientists to independently manage volumes reduces the provisioning and maintenance workload, freeing up valuable resources for strategic initiatives rather than day-to-day support. It also means that each project has its own space, minimizing the risk of data conflicts, reducing storage overhead, and ensuring that each team member can work with the most relevant, up-to-date data—improving both productivity and security.
Figure 5: IT admins can see a list of all DVNO volumes and metadata, such as size, in BlueXP.
Conclusion
The Domino and NetApp partnership continues to evolve with deeper integrations to enhance AI lifecycle management and productivity. Intelligent data mobility, optimized hybrid operations, and seamless access to critical data are now available through the Domino Volumes for NetApp ONTAP integration. This allows data science teams to focus on building models without being slowed by data bottlenecks. Stay tuned for more developments as we expand our AI infrastructure capabilities.
Ready to learn more? Check out the Domino Volumes for NetApp ONTAP demo, read Domino’s recent press release, and discover more insights at domino.ai/partners/netapp.
... View more
For organizations that are invested in cloud and hybrid solutions, AWS re:Invent is one of the most important tech conferences to close out the year. NetApp is excited to participate and to share a few of the new solutions that our partnership with Amazon is bringing to market.
... View more
In the ever-evolving landscape of artificial intelligence and machine learning (AI and ML), the adoption of vector databases has emerged as foundational for enhancing the capabilities and performance of retrieval-augmented generation (RAG) systems. These specialized databases are designed to efficiently store, search, and manage vector embeddings, which are high-dimensional representations of data, enabling fast retrieval of relevant information that significantly boosts the intelligence and responsiveness of RAG-based architectures.
Using vector databases in RAG is not merely a technical enhancement; it’s a paradigm shift. By enabling more nuanced and contextually aware retrievals, vector databases empower applications to generate responses that are grounded in the semantic meaning of the data. This leap in relevance is crucial for a wide range of applications, from natural language processing and conversational AI to personalized recommendations and beyond. And it marks a pivotal moment in our journey toward creating more intelligent, efficient, and human-centric AI systems.
In this blog post, we delve into the I/O characteristics of vector databases. Understanding these characteristics is pivotal for effectively using vector databases in RAG deployments, because they directly affect the performance, scalability, and efficiency of these systems.
Table of Contents
Lab Setup. 2
Infrastructure. 2
Software. 2
Benchmark: VectorDB-Bench. 2
Vector Database. 2
Methodology. 3
Results & Lessons Learned. 4
Results. 4
Lessons Learned. 5
References. 6
Lab setup
This section describes the lab setup for our study.
Infrastructure
The testbed includes a NetAp® AFF A800 HA-pair running ONTAP® 9.14.1, a Fujitsu PRIMERGY RX2540-M4 running Ubuntu 22.04, and connections between host and storage going through a Cisco switch using 100GbE connections.
The system setup included the NetApp system with four 100GbE connections to a Cisco switch, and the Fujitsu host connected via a single 100GbE link. For performance optimization relative to the single host, 48 NetApp FlexVol® volumes were configured, each with one LUN, all mapped to the host by using the iSCSI protocol.
On the host, the /etc/iscsi/iscsid.conf file was modified to increase the number of iSCSI sessions from one to four, and multipathd was enabled. A volume group was then established using these 48 LUNs, and a striped logical volume was created to support the XFS file system.
Software
This section outlines the configuration of the software stack that we used during our performance measurements.
Benchmark: VectorDB-Bench
VectorDB-Bench is a vector database benchmark tool designed for user-friendliness. It enables anyone to easily replicate tests or evaluate new systems, simplifying the selection process among numerous cloud and open-source providers.
VectorDB-Bench tests mimic real-world conditions, including data insertion and various search functions, using public datasets from actual production environments like SIFT, GIST, Cohere, and one generated by OpenAI.
Vector databases
Milvus
Milvus is a database that is engineered specifically for storing, indexing, and managing the vast amounts of embedding vectors generated by deep neural networks and other machine learning models. Designed to operate on a scale of billions of vectors, Milvus excels in handling embedding vectors derived from unstructured data, a task that traditional relational databases, which focus on structured data, cannot perform.
With the rise in the volume of unstructured data, such as emails, social media content, and IoT sensor data, Milvus can store this data in the form of vector embeddings. This ability allows it to measure the similarity between vectors and, by extension, the similarity of the data source they originate from.
PostgreSQL pgvecto.rs extension
Pgvecto.rs is a PostgreSQL extension that enhances the relational database with vector similarity search capabilities. It is developed in Rust and builds on the framework provided by pgrx.
Index types
Hierarchical Navigable Small World index
The Hierarchical Navigable Small World (HNSW) index is a type of data structure used in vector databases for efficient search of high-dimensional data. It’s particularly good at finding the nearest neighbors in this kind of data, which is a common requirement for many machine learning applications, such as recommendation systems and similarity searches. How does it work?
Imagine that you’re at a large party and you need to find a group of people who share your interests out of hundreds of guests. Walking up to each person to find out if they’re a match would take a long time. Instead, HNSW organizes people into groups based on how similar they are to each other, creating layers of these groups from very broad to very specific. When you start your search, you first interact with the broad groups, which quickly guide you to increasingly specific groups until you find your best matches without having to meet everyone at the party.
Disk-Approximate Nearest Neighbor index
The Disk-Approximate Nearest Neighbor (DiskANN) index is a type of indexing mechanism designed to efficiently perform nearest neighbor searches on very large datasets that don’t fit entirely into the main memory of a host, but rather need to be stored on disk. How does it work?
Suppose that you have a huge library of books, far more than could fit on a single self or even in an entire room. You need a system to find the most relevant book based on a topic you’re interested in. However, space constraints mean that you can’t possibly have all the books laid out in front of you at once, so you need a smart way to store and retrieve them. DiskANN creates an efficient pathway to retrieve the most relevant books (or data points) from your storage (the disk), even though they’re not all immediately accessible in your main memory. It optimizes the layout of data on the disk and intelligently caches parts of the data to minimize the disk access times, which are typically the bottleneck in such large-scale systems.
HNSW versus DiskANN
In summary, HNSW is highly efficient for datasets that can fit within the server’s cache (RAM), leveraging fast memory access to speed up the search for nearest neighbors in high-dimensional space. However, its effectiveness is bounded by the amount of RAM available, which can limit its use in extremely large datasets.
On the other hand, DiskANN is designed to handle situations where the dataset is too large to fit into RAM. It uses clever strategies to minimize the performance penalties of having to fetch data from slower disk storage, thereby extending the potential size of the dataset to the limits of disk capacity. This makes DiskANN suitable for massive datasets, trading off some speed for the ability to handle larger amounts of data.
Methodology
We started our setup by deploying a Milvus standalone instance using a shell script, available at https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh. The script spins up a set of three containers, which constitute the Milvus database service.
Next, we measured the performance of the Milvus database instance using two datasets. The OpenAI dataset contains 5 million vectors, each with 1,536 dimensions using the DiskANN index. The LAION dataset contains 10 million vectors, each with 768 dimensions using the HNSW. The LAION dataset was used in the comparison of Milvus versus pgvecto.rs.
The measurement using the DiskANN index focused on understanding the I/O characteristics of this type of index. The measurement using the HNSW focused on checking whether there would be any I/O at all, since it’s an in-memory index, and it was used for the performance comparison between Milvus and pgvecto.rs.
To capture the I/O characteristics of the database during the vectordb-bench process, we recorded the start and end dates and times for each run and generated an ONTAP performance archive corresponding to the measurement periods.
When the Milvus measurements were completed, we switched the database to PostgreSQL running with pgvecto.rs 0.2.0.
About the index type we used in our measurements: For Milvus, which supports HNSW and DiskANN, we collected measurements with both indexes. At the time of that we measured performance, pgvecto.rs didn’t have support for DiskANN, so we collected measurements with HNSW.
Results and lessons learned
Results
First, let’s examine the performance of Milvus and Pgvecto.rs using the HNSW index. Pgvecto.rs delivered 1,068 queries per second (QPS) with a recall rate of 0.6344, whereas Milvus managed 106 QPS but achieved a higher recall of 0.9842. In terms of the 99 th percentile latency, Milvus demonstrated marginally better latency compared to Pgvecto.rs.
From the perspective of storage, there was no disk I/O, which aligns with expectations, because the index is memory-based and was completely loaded into RAM.
When precision in query results is important, according to the benchmark results, Milvus is superior to Pgvecto.rs because it retrieves a higher proportion of relevant items for each query.
When query throughput is the priority, Pgvecto.rs outperforms Milvus in terms of QPS. However, it’s important to note that the relevance of the retrieved data is compromised, because 37% of the results are not pertinent to the specified query.
Let’s now examine Milvus using the DiskANN index. Milvus reached 10.93 QPS with a recall rate of 0.9987 and a 99 th percentile latency of 708.2 milliseconds. Notably, the host CPU, operating at full capacity throughout, was the primary bottleneck.
From a storage point of view, the data ingestion and post-insert optimization phase primarily involved a mix of read and write operations, predominantly writes, with an average I/O size of 64KB. During the query phase, the workload consisted entirely of random read operations, with an average I/O size of 8KB.
Lessons learned
In reviewing the index implementations for vector databases, HNSW emerges as the predominant type, largely due to its established presence. DiskANN, being a newer technology, is not yet as universally adopted. However, as generative AI applications expand and the associated data grows, more developers are integrating DiskANN options into vector databases.
DiskANN is increasingly important for managing large, high-dimensional datasets that exceed RAM capacities, and it is gaining traction in the market. Its disk I/O profile is well suited for modern flash-based storage systems, like NetApp AFF A-Series and C-Series, ensuring that it handles large data volumes efficiently.
References
[1] VectorDB Benchmark. https://github.com/zilliztech/VectorDBBench
[2] Milvus Vector Database. https://milvus.io/docs
[3] Postgres pgvecto.rs Database. https://docs.pgvecto.rs/getting-started/overview.html
NetApp style is to keep sentences to about 35 words, so I broke this one into two sentences. [WJ1]
I made Index types a second-level head, like Vector databases. OK? [WJ2]
I made this a third-level head, as in the previous section. OK? [WJ3]
I added this third-level head. OK? [WJ4]
... View more