Tech ONTAP Blogs
Tech ONTAP Blogs
As the digital age relentlessly advances, the realm of artificial intelligence (AI) continues to expand, with generative AI carving out a significant niche in the industry. At the heart of many generative AI applications lies the need for efficient and effective management of high-dimensional data—enter vector databases. These specialized databases, such as Milvus, are tailored to handle vector embeddings, which are central in tasks ranging from recommendation systems to natural language processing. The importance of vector databases in generative AI cannot be overstated; they provide the foundational infrastructure that enables AI models to swiftly retrieve and compare complex data points, turning vast seas of information into actionable insights.
Deploying these databases in a cloud environment, especially on platforms like Google Cloud’s Google Kubernetes Engine (GKE), backed by robust storage solutions such as Google Cloud NetApp Volumes (NetApp Volumes), ensures that generative AI applications not only are scalable but also maintain high performance and reliability. The right infrastructure is crucial—it must be able to handle the intense workloads and data throughput that generative AI demands, and flexible enough to adapt to the ever-evolving landscape of AI technologies.
But how can you ensure that your deployment is primed to meet these demands? This is where performance testing comes into play—specifically the use of ANN-Benchmarks. By running these benchmarks on your deployment, you can glean invaluable insights into how different vector databases and machine learning algorithms will perform under various conditions. This data is indispensable when it comes to making informed decisions about which combination of technologies will best serve your generative AI applications.
In this blog, we’ll dive into the intricacies of deploying the vector database Milvus on GKE, backed by NetApp Volumes storage, and how to employ ANN-Benchmarks to fine-tune your infrastructure. Whether you’re a seasoned data scientist or just venturing into the world of generative AI, understanding the interaction between these components is key to unlocking the full potential of your AI applications.
If you’re following along with your own deployment, make sure you have the following infrastructure already deployed:
Depending on your generative AI and vector database use case, feel free to use different-sized GKE and VM instances and NetApp Volumes storage pools. However, be aware of the throughput limits for different NetApp Volumes service levels, and size the Milvus volumes (as detailed in the next section) appropriately in relation to your GKE instance sizes and generative AI application requirements.
To deploy Milvus to our GKE cluster, we’ll use the Kubernetes helm chart. SSH to your VM instance mentioned in the prerequisites, and then run the following commands:
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update
To see all possible configuration values for the Milvus helm chart, run the following command:
helm show values milvus/milvus
Depending on your generative AI application and/or vector database use case, you may want to customize the default deployment values. The Milvus sizing tool is a great resource for determining the appropriate values.
For the performance benchmarks, we’ll use the default deployment, other than changing the persistent volume size and specifying an internal IP load balancer service. Run the following command in your workstation terminal to create the helm values file:
cat <<EOF > values.yaml
minio:
persistence:
size: 8Ti
etcd:
persistence:
size: 1Ti
service:
type: LoadBalancer
annotations:
networking.gke.io/load-balancer-type: "Internal"
EOF
For this deployment, the MinIO object storage deployment creates four pods with four corresponding (8TiB) volumes. This provides 32TiB of storage, which corresponds to 32Gbps of total throughput with NetApp Volumes, which is equivalent to the bandwidth provided by the n2-standard-32 instances chosen for our GKE cluster.
The service is configured to deploy a load balancer, rather than the default cluster IP, to enable access to the vector database from outside the Kubernetes cluster. Additionally, we’re specifying an internal annotation, because we don’t need a public IP address since our client VM lives on the same network as our GKE cluster.
Before we deploy Milvus, let’s ensure that our NetApp Volumes Extreme storage class is set as default:
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
netapp-gcnv-extreme (default) csi.trident.netapp.io Delete Immediate true 19m
premium-rwo pd.csi.storage.gke.io Delete WaitForFirstConsumer true 45m
standard kubernetes.io/gce-pd Delete Immediate true 45m
standard-rwo pd.csi.storage.gke.io Delete WaitForFirstConsumer true 45m
Please see this Kubernetes documentation page if it needs to be changed.
We’re now ready to deploy Milvus with the following command:
helm install milvus -n milvus --create-namespace milvus/milvus -f values.yaml
After 5 to 10 minutes, you should see all the persistent volumes go into a Bound state:
$ kubectl -n milvus get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-milvus-etcd-0 Bound pvc-0368c3d3-e6cc-4370-ac48-91036a6ded58 1Ti RWO netapp-gcnv-extreme 9m
data-milvus-etcd-1 Bound pvc-be2bf3a6-4fd2-4bed-b011-60ef5cce578b 1Ti RWO netapp-gcnv-extreme 8m59s
data-milvus-etcd-2 Bound pvc-b9c86600-0286-42f4-87a4-4d64365f3efc 1Ti RWO netapp-gcnv-extreme 8m59s
export-milvus-minio-0 Bound pvc-9a6748d2-75fe-4e42-85c2-8f7649443955 8Ti RWO netapp-gcnv-extreme 8m58s
export-milvus-minio-1 Bound pvc-9c857965-52c5-4311-92bb-44c1f49d4171 8Ti RWO netapp-gcnv-extreme 8m57s
export-milvus-minio-2 Bound pvc-90a95d75-db91-41ff-9688-2e7f93eec3e2 8Ti RWO netapp-gcnv-extreme 8m57s
export-milvus-minio-3 Bound pvc-b4510fd9-b8dc-4127-8f27-1216def01e1a 8Ti RWO netapp-gcnv-extreme 8m56s
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-0 Bound pvc-c0868fbc-d298-46e5-bb53-51c43ea4a105 100Gi RWO netapp-gcnv-extreme 9m
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-1 Bound pvc-68ca479f-8664-4d25-a9ac-eb681577b2b1 100Gi RWO netapp-gcnv-extreme 8m59s
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-2 Bound pvc-2b7cc601-1edb-4810-b557-8d5f3669dbd1 100Gi RWO netapp-gcnv-extreme 8m58s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-0 Bound pvc-027d4f94-8508-4751-90c1-ff8d78ca5852 200Gi RWO netapp-gcnv-extreme 8m59s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-1 Bound pvc-f49f3a2f-ece9-436f-8d12-2a47f74930f4 200Gi RWO netapp-gcnv-extreme 8m58s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-2 Bound pvc-2be66361-0038-4218-a207-b3806630f640 200Gi RWO netapp-gcnv-extreme 8m58s
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-0 Bound pvc-ac3f3bcf-1250-408d-91a3-13913aa17dcb 100Gi RWO netapp-gcnv-extreme 9m
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-1 Bound pvc-87744579-1497-4639-b625-92ca9524a2d7 100Gi RWO netapp-gcnv-extreme 7m12s
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-2 Bound pvc-6d8ca512-414d-419e-b243-e64ce3a9a991 100Gi RWO netapp-gcnv-extreme 6m23s
After the volumes are bound, all the pods should be in a Running or Completed state:
$ kubectl -n milvus get pods
NAME READY STATUS RESTARTS AGE
milvus-datacoord-77c6d77696-7nrr6 1/1 Running 6 (7m45s ago) 10m
milvus-datanode-69ccf878d9-zcbtm 1/1 Running 5 (6m34s ago) 10m
milvus-etcd-0 1/1 Running 0 10m
milvus-etcd-1 1/1 Running 0 10m
milvus-etcd-2 1/1 Running 0 10m
milvus-indexcoord-676bcc7f95-pkvvw 1/1 Running 0 10m
milvus-indexnode-54d749985d-2h9rk 1/1 Running 4 (7m52s ago) 10m
milvus-minio-0 1/1 Running 0 10m
milvus-minio-1 1/1 Running 0 10m
milvus-minio-2 1/1 Running 0 10m
milvus-minio-3 1/1 Running 0 10m
milvus-proxy-7dbdb67859-stvc9 1/1 Running 5 (6m34s ago) 10m
milvus-pulsar-bookie-0 1/1 Running 0 10m
milvus-pulsar-bookie-1 1/1 Running 0 10m
milvus-pulsar-bookie-2 1/1 Running 0 10m
milvus-pulsar-bookie-init-qdtrt 0/1 Completed 0 10m
milvus-pulsar-broker-0 1/1 Running 0 10m
milvus-pulsar-proxy-0 1/1 Running 0 10m
milvus-pulsar-pulsar-init-5ztw8 0/1 Completed 0 10m
milvus-pulsar-recovery-0 1/1 Running 0 10m
milvus-pulsar-zookeeper-0 1/1 Running 0 10m
milvus-pulsar-zookeeper-1 1/1 Running 0 10m
milvus-pulsar-zookeeper-2 1/1 Running 0 10m
milvus-querycoord-5c7f74f66b-p9zjp 1/1 Running 5 (6m34s ago) 10m
milvus-querynode-5fb8bb84b4-wgbkh 1/1 Running 6 (7m48s ago) 10m
milvus-rootcoord-8579f69976-xvvbn 1/1 Running 4 (8m53s ago) 10m
Finally, view the services, and take note of the Milvus load balancer external IP (10.10.0.9 in this output), because we’ll need that in the next section.
$ kubectl -n milvus get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
milvus LoadBalancer 172.17.19.214 10.10.0.9 19530:30681/TCP,9091:31024/TCP 13m
milvus-datacoord ClusterIP 172.17.194.168 <none> 13333/TCP,9091/TCP 13m
milvus-datanode ClusterIP None <none> 9091/TCP 13m
milvus-etcd ClusterIP 172.17.199.209 <none> 2379/TCP,2380/TCP 13m
milvus-etcd-headless ClusterIP None <none> 2379/TCP,2380/TCP 13m
milvus-indexcoord ClusterIP 172.17.135.240 <none> 31000/TCP,9091/TCP 13m
milvus-indexnode ClusterIP None <none> 9091/TCP 13m
milvus-minio ClusterIP 172.17.57.73 <none> 9000/TCP 13m
milvus-minio-svc ClusterIP None <none> 9000/TCP 13m
milvus-pulsar-bookie ClusterIP None <none> 3181/TCP,8000/TCP 13m
milvus-pulsar-broker ClusterIP None <none> 8080/TCP,6650/TCP 13m
milvus-pulsar-proxy ClusterIP 172.17.149.7 <none> 8080/TCP,6650/TCP 13m
milvus-pulsar-recovery ClusterIP None <none> 8000/TCP 13m
milvus-pulsar-zookeeper ClusterIP None <none> 8000/TCP,2888/TCP,3888/TCP,2181/TCP 13m
milvus-querycoord ClusterIP 172.17.199.168 <none> 19531/TCP,9091/TCP 13m
milvus-querynode ClusterIP None <none> 9091/TCP 13m
milvus-rootcoord ClusterIP 172.17.1.96 <none> 53100/TCP,9091/TCP 13m
Now that Milvus has been successfully deployed, we’re ready to start our performance testing with ANN-Benchmarks.
Approximate nearest neighbor (ANN) algorithms are a critical component of vector search, particularly with high-dimensional data that is typical in generative AI applications. These algorithms provide a pragmatic balance between accuracy and computational efficiency, enabling the retrieval of data points that are close to a given query point without exhaustively comparing it to every other point in the dataset. Unlike exact nearest neighbor searches, which can be computationally prohibitive as the dimensionality and size of the dataset grow, ANN algorithms employ various strategies such as hashing, trees, or graph-based approaches to quickly narrow down the search space. This approach allows rapid query responses even within vast datasets, making them valuable for real-time applications that rely on similarity search, such as content recommendation, image and voice recognition, and natural language processing.
ANN-Benchmarks is a popular tool for evaluating the performance of various ANN implementations. It provides a standard framework for assessing how different algorithms and vector databases perform on common tasks and datasets. By running ANN-Benchmarks, developers can compare the query time, accuracy, and memory usage of their vector database against other similar solutions. This benchmarking is crucial because it sheds light on the scalability and efficiency of the ANN algorithms underpinning the vector database, which directly affects the responsiveness and viability of generative AI applications.
Using the data from ANN-Benchmarks, developers can make informed decisions about their generative AI applications. For instance, if the benchmarks show that a particular database offers the fastest query times with acceptable accuracy, it might be the best choice for applications in which real-time performance is most critical. On the other hand, if memory usage is a concern, a more memory-efficient database might be preferable, even if it means slightly longer query times. Ultimately, the insights gained from ANN-Benchmarks allow developers to strike the right balance among speed, accuracy, and resource consumption, ensuring that their generative AI application can operate effectively at the desired scale.
To get started with ANN-Benchmarks, SSH to your workstation VM and run the following commands:
git clone https://github.com/erikbern/ann-benchmarks.git
cd ann-benchmarks
Next, we’ll install Python 3.10, because it’s the validated version of Python for ANN-Benchmarks (these commands are for Ubuntu, and will vary across Linux distributions):
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.10 python3.10-distutils python3.10-venv
We’ll now set up our Python virtual environment and install the necessary Python packages:
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pymilvus
Finally, we’ll update the Milvus module to point at our Milvus load balancer service. Use your favorite text editor to open the following file:
vim ann_benchmarks/algorithms/milvus/module.py
Update line 25:
self.connects.connect("default", host='localhost', port='19530')
Replace localhost with the external IP from the Milvus load balancer service gathered in the previous section (your IP will very likely be different):
self.connects.connect("default", host='10.10.0.9', port='19530')
When the update is complete, save and quit your text editor.
We’re now ready to start our performance benchmarks with the following command (this command took around 8 hours in our environment):
python run.py --algorithm milvus-hnsw --local
The algorithm argument instructs ANN-Benchmarks to run the Milvus vector database tests, with the HNSW (hierarchical navigable small world graph) vector index. This vector index is a graph-based index with very high-speed queries; however, it does require high memory resources. The local argument instructs ANN-Benchmarks to run the tests “locally” rather than using a Docker container, but because we’re pointing at our external Milvus instance, these tests aren’t truly local.
For this test, we’re omitting the dataset argument, which will result in falling back to the default, GloVe 100 Angular dataset. This dataset is a collection of 100-dimensional word vectors from the Global Vectors for Word Representation project. “Angular” refers to the use of angular distance for measuring the similarity between vectors, focusing on the direction rather than the magnitude of the vectors. This dataset helps in evaluating the performance of vector databases in terms of speed and accuracy for tasks involving semantic word searches.
The output of the command should look like this:
$ python run.py --algorithm milvus-hnsw --local
downloading https://ann-benchmarks.com/glove-100-angular.hdf5 -> data/glove-100-angular.hdf5...
2024-08-24 01:27:27,035 - annb - INFO - running only milvus-hnsw
2024-08-24 01:27:27,377 - annb - INFO - Order: [Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 16, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False), Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 48, 'efConstruction': 500}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False), Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 64, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False),
...
After the GloVe 100 Angular dataset is downloaded, the list of tests to be executed are printed out. These tests vary a set of arguments:
After the tests have been running for a period of time, open the metrics explorer page of the Google Cloud console, and type netapp.googleapis.com in the Select a Metric field. Select the NetApp Volumes operations count metric, and optionally add a filter depending on the number of volumes in your environment. Ensure that the persistent volume claims created during the deployment of Milvus are being utilized.
After several more hours, our tests will be complete, and we can move on to our next dataset.
We’ll now run our next set of benchmarks, this time specifying the Sift 128 Euclidean dataset. This dataset consists of 128-dimensional vectors obtained using the scale-invariant feature transform algorithm for image feature extraction. It uses the Euclidean distance metric, which is the standard “straight-line” distance in the multidimensional space. The Sift dataset is essential for evaluating image matching, object recognition, and other computer vision tasks that require quick and reliable feature comparison.
Back in your workstation, run the following command, and take note of the additional dataset argument:
python run.py --algorithm milvus-hnsw --local --dataset sift-128-euclidean
In our testing, this command took about 5 hours. When it’s complete, we can move on to the analysis of the results.
As detailed in the ANN-Benchmarks readme, there are several ways to analyze the results of the previous tests:
Feel free to use each of these methods; however, we’ll be using the create_website script, because it’s simple but still provides a large amount of insight. From your workstation VM, run the following command:
python create_website.py
If your VM has a desktop environment, you can open the milvus-hnsw.html file. Otherwise, copy it to your physical machine by using the following command:
scp <user>@<ip>:/home/<user>/ann-benchmarks/milvus-hnsw.html milvus-hnsw.html
This HTML page contains about a dozen graphs (which are included in the appendix). Let’s dig into a couple of them here.
This chart helps visualize the performance of Milvus (using an HNSW vector index) when searching two different datasets. The axes represent recall and queries per second:
As mentioned in the chart label, trendlines to the up and right are better, meaning Milvus HNSW performed consistently better with the Sift 128 Euclidean dataset than the GloVe 100 Angular dataset. This result indicates that Milvus HNSW is better suited for computer vision tasks than natural language processing. However, we recommend testing against additional datasets and vector databases to find the best match for your specific application.
Let’s view another graph:
Instead of purely showing results from the query phase as in the previous chart, this chart shows the trade-off between the amount of time it takes to build the index (y-axis) and the recall value (x-axis). Build time is a one-time cost to create the data structure and must be incurred before any queries can be processed.
Some vector databases and their underlying algorithms have a very fast build time but yield lower recall, making them suitable for applications where the index needs to be built (or rebuilt) quickly and frequently, and where perfect accuracy isn’t critical. On the other hand, some algorithms might take longer to build their indexes but provide higher recall, making them a better choice for applications where the accuracy of search results is most critical and the index doesn’t need to be updated as often.
In summary, the deployment of Milvus on Google Kubernetes Engine with Google Cloud NetApp Volumes showcases a powerful, scalable solution for managing high-dimensional vector data essential to generative AI. The integration of these technologies, tested rigorously through ANN-Benchmarks, provides clear insights into optimizing performance for AI applications.
The results confirm the necessity of selecting robust infrastructure to meet the demands of vector databases. GKE and NetApp Volumes offer the scalability and flexibility required for the evolving landscape of AI, so that as data and model complexities grow, the system remains capable and efficient.
This exploration into Milvus on GKE with Google Cloud NetApp Volumes equips developers and data scientists with valuable knowledge to fine-tune their generative AI applications, ensuring that they remain at the cutting edge of AI innovation. As AI progresses, the combination of advanced vector databases and cloud-native technologies will continue to be instrumental in driving forward the next generation of AI advancements.