Prerequisites

MichaelHaigh · ‎2024-09-17

As the digital age relentlessly advances, the realm of artificial intelligence (AI) continues to expand, with generative AI carving out a significant niche in the industry. At the heart of many generative AI applications lies the need for efficient and effective management of high-dimensional data—enter vector databases. These specialized databases, such as Milvus, are tailored to handle vector embeddings, which are central in tasks ranging from recommendation systems to natural language processing. The importance of vector databases in generative AI cannot be overstated; they provide the foundational infrastructure that enables AI models to swiftly retrieve and compare complex data points, turning vast seas of information into actionable insights.

Deploying these databases in a cloud environment, especially on platforms like Google Cloud’s Google Kubernetes Engine (GKE), backed by robust storage solutions such as Google Cloud NetApp Volumes (NetApp Volumes), ensures that generative AI applications not only are scalable but also maintain high performance and reliability. The right infrastructure is crucial—it must be able to handle the intense workloads and data throughput that generative AI demands, and flexible enough to adapt to the ever-evolving landscape of AI technologies.

But how can you ensure that your deployment is primed to meet these demands? This is where performance testing comes into play—specifically the use of ANN-Benchmarks. By running these benchmarks on your deployment, you can glean invaluable insights into how different vector databases and machine learning algorithms will perform under various conditions. This data is indispensable when it comes to making informed decisions about which combination of technologies will best serve your generative AI applications.

In this blog, we’ll dive into the intricacies of deploying the vector database Milvus on GKE, backed by NetApp Volumes storage, and how to employ ANN-Benchmarks to fine-tune your infrastructure. Whether you’re a seasoned data scientist or just venturing into the world of generative AI, understanding the interaction between these components is key to unlocking the full potential of your AI applications.

Prerequisites

If you’re following along with your own deployment, make sure you have the following infrastructure already deployed:

A GKE cluster with at least 40 vCPU and 167GiB of memory available (we’re using two n2-standard-32 nodes, which meets this minimum while also providing 32Gbps of network bandwidth), deployed in a region with NetApp Volumes Extreme service level available
A Google Cloud NetApp Volumes storage pool with at least 38TiB available in the same region as the GKE cluster, and connected to the GKE network
NetApp Astra^™ Trident^™ software installed on the GKE cluster, with a back-end configuration and storage class (set as the default storage class) referencing the NetApp Volumes Extreme storage pool
A Linux VM instance (such as Ubuntu) with at least 32Gbps of network bandwidth, created in the same region and network as the GKE cluster, with helm and kubectl installed and configured to access your GKE cluster

Depending on your generative AI and vector database use case, feel free to use different-sized GKE and VM instances and NetApp Volumes storage pools. However, be aware of the throughput limits for different NetApp Volumes service levels, and size the Milvus volumes (as detailed in the next section) appropriately in relation to your GKE instance sizes and generative AI application requirements.

Install Milvus

To deploy Milvus to our GKE cluster, we’ll use the Kubernetes helm chart. SSH to your VM instance mentioned in the prerequisites, and then run the following commands:

helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update

To see all possible configuration values for the Milvus helm chart, run the following command:

helm show values milvus/milvus

Depending on your generative AI application and/or vector database use case, you may want to customize the default deployment values. The Milvus sizing tool is a great resource for determining the appropriate values.

For the performance benchmarks, we’ll use the default deployment, other than changing the persistent volume size and specifying an internal IP load balancer service. Run the following command in your workstation terminal to create the helm values file:

cat <<EOF > values.yaml
minio:
  persistence:
    size: 8Ti
etcd:
  persistence:
    size: 1Ti
service:
  type: LoadBalancer
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
EOF

For this deployment, the MinIO object storage deployment creates four pods with four corresponding (8TiB) volumes. This provides 32TiB of storage, which corresponds to 32Gbps of total throughput with NetApp Volumes, which is equivalent to the bandwidth provided by the n2-standard-32 instances chosen for our GKE cluster.

The service is configured to deploy a load balancer, rather than the default cluster IP, to enable access to the vector database from outside the Kubernetes cluster. Additionally, we’re specifying an internal annotation, because we don’t need a public IP address since our client VM lives on the same network as our GKE cluster.

Before we deploy Milvus, let’s ensure that our NetApp Volumes Extreme storage class is set as default:

$ kubectl get sc
NAME                            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
netapp-gcnv-extreme (default)   csi.trident.netapp.io   Delete          Immediate              true                   19m
premium-rwo                     pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   45m
standard                        kubernetes.io/gce-pd    Delete          Immediate              true                   45m
standard-rwo                    pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   45m

Please see this Kubernetes documentation page if it needs to be changed.

We’re now ready to deploy Milvus with the following command:

helm install milvus -n milvus --create-namespace milvus/milvus -f values.yaml

After 5 to 10 minutes, you should see all the persistent volumes go into a Bound state:

$ kubectl -n milvus get pvc
NAME                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
data-milvus-etcd-0                                       Bound    pvc-0368c3d3-e6cc-4370-ac48-91036a6ded58   1Ti        RWO            netapp-gcnv-extreme   9m
data-milvus-etcd-1                                       Bound    pvc-be2bf3a6-4fd2-4bed-b011-60ef5cce578b   1Ti        RWO            netapp-gcnv-extreme   8m59s
data-milvus-etcd-2                                       Bound    pvc-b9c86600-0286-42f4-87a4-4d64365f3efc   1Ti        RWO            netapp-gcnv-extreme   8m59s
export-milvus-minio-0                                    Bound    pvc-9a6748d2-75fe-4e42-85c2-8f7649443955   8Ti        RWO            netapp-gcnv-extreme   8m58s
export-milvus-minio-1                                    Bound    pvc-9c857965-52c5-4311-92bb-44c1f49d4171   8Ti        RWO            netapp-gcnv-extreme   8m57s
export-milvus-minio-2                                    Bound    pvc-90a95d75-db91-41ff-9688-2e7f93eec3e2   8Ti        RWO            netapp-gcnv-extreme   8m57s
export-milvus-minio-3                                    Bound    pvc-b4510fd9-b8dc-4127-8f27-1216def01e1a   8Ti        RWO            netapp-gcnv-extreme   8m56s
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-0      Bound    pvc-c0868fbc-d298-46e5-bb53-51c43ea4a105   100Gi      RWO            netapp-gcnv-extreme   9m
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-1      Bound    pvc-68ca479f-8664-4d25-a9ac-eb681577b2b1   100Gi      RWO            netapp-gcnv-extreme   8m59s
milvus-pulsar-bookie-journal-milvus-pulsar-bookie-2      Bound    pvc-2b7cc601-1edb-4810-b557-8d5f3669dbd1   100Gi      RWO            netapp-gcnv-extreme   8m58s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-0      Bound    pvc-027d4f94-8508-4751-90c1-ff8d78ca5852   200Gi      RWO            netapp-gcnv-extreme   8m59s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-1      Bound    pvc-f49f3a2f-ece9-436f-8d12-2a47f74930f4   200Gi      RWO            netapp-gcnv-extreme   8m58s
milvus-pulsar-bookie-ledgers-milvus-pulsar-bookie-2      Bound    pvc-2be66361-0038-4218-a207-b3806630f640   200Gi      RWO            netapp-gcnv-extreme   8m58s
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-0   Bound    pvc-ac3f3bcf-1250-408d-91a3-13913aa17dcb   100Gi      RWO            netapp-gcnv-extreme   9m
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-1   Bound    pvc-87744579-1497-4639-b625-92ca9524a2d7   100Gi      RWO            netapp-gcnv-extreme   7m12s
milvus-pulsar-zookeeper-data-milvus-pulsar-zookeeper-2   Bound    pvc-6d8ca512-414d-419e-b243-e64ce3a9a991   100Gi      RWO            netapp-gcnv-extreme   6m23s

After the volumes are bound, all the pods should be in a Running or Completed state:

$ kubectl -n milvus get pods
NAME                                 READY   STATUS      RESTARTS        AGE
milvus-datacoord-77c6d77696-7nrr6    1/1     Running     6 (7m45s ago)   10m
milvus-datanode-69ccf878d9-zcbtm     1/1     Running     5 (6m34s ago)   10m
milvus-etcd-0                        1/1     Running     0               10m
milvus-etcd-1                        1/1     Running     0               10m
milvus-etcd-2                        1/1     Running     0               10m
milvus-indexcoord-676bcc7f95-pkvvw   1/1     Running     0               10m
milvus-indexnode-54d749985d-2h9rk    1/1     Running     4 (7m52s ago)   10m
milvus-minio-0                       1/1     Running     0               10m
milvus-minio-1                       1/1     Running     0               10m
milvus-minio-2                       1/1     Running     0               10m
milvus-minio-3                       1/1     Running     0               10m
milvus-proxy-7dbdb67859-stvc9        1/1     Running     5 (6m34s ago)   10m
milvus-pulsar-bookie-0               1/1     Running     0               10m
milvus-pulsar-bookie-1               1/1     Running     0               10m
milvus-pulsar-bookie-2               1/1     Running     0               10m
milvus-pulsar-bookie-init-qdtrt      0/1     Completed   0               10m
milvus-pulsar-broker-0               1/1     Running     0               10m
milvus-pulsar-proxy-0                1/1     Running     0               10m
milvus-pulsar-pulsar-init-5ztw8      0/1     Completed   0               10m
milvus-pulsar-recovery-0             1/1     Running     0               10m
milvus-pulsar-zookeeper-0            1/1     Running     0               10m
milvus-pulsar-zookeeper-1            1/1     Running     0               10m
milvus-pulsar-zookeeper-2            1/1     Running     0               10m
milvus-querycoord-5c7f74f66b-p9zjp   1/1     Running     5 (6m34s ago)   10m
milvus-querynode-5fb8bb84b4-wgbkh    1/1     Running     6 (7m48s ago)   10m
milvus-rootcoord-8579f69976-xvvbn    1/1     Running     4 (8m53s ago)   10m

Finally, view the services, and take note of the Milvus load balancer external IP (10.10.0.9 in this output), because we’ll need that in the next section.

$ kubectl -n milvus get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                               AGE
milvus                    LoadBalancer   172.17.19.214    10.10.0.9     19530:30681/TCP,9091:31024/TCP        13m
milvus-datacoord          ClusterIP      172.17.194.168   <none>        13333/TCP,9091/TCP                    13m
milvus-datanode           ClusterIP      None             <none>        9091/TCP                              13m
milvus-etcd               ClusterIP      172.17.199.209   <none>        2379/TCP,2380/TCP                     13m
milvus-etcd-headless      ClusterIP      None             <none>        2379/TCP,2380/TCP                     13m
milvus-indexcoord         ClusterIP      172.17.135.240   <none>        31000/TCP,9091/TCP                    13m
milvus-indexnode          ClusterIP      None             <none>        9091/TCP                              13m
milvus-minio              ClusterIP      172.17.57.73     <none>        9000/TCP                              13m
milvus-minio-svc          ClusterIP      None             <none>        9000/TCP                              13m
milvus-pulsar-bookie      ClusterIP      None             <none>        3181/TCP,8000/TCP                     13m
milvus-pulsar-broker      ClusterIP      None             <none>        8080/TCP,6650/TCP                     13m
milvus-pulsar-proxy       ClusterIP      172.17.149.7     <none>        8080/TCP,6650/TCP                     13m
milvus-pulsar-recovery    ClusterIP      None             <none>        8000/TCP                              13m
milvus-pulsar-zookeeper   ClusterIP      None             <none>        8000/TCP,2888/TCP,3888/TCP,2181/TCP   13m
milvus-querycoord         ClusterIP      172.17.199.168   <none>        19531/TCP,9091/TCP                    13m
milvus-querynode          ClusterIP      None             <none>        9091/TCP                              13m
milvus-rootcoord          ClusterIP      172.17.1.96      <none>        53100/TCP,9091/TCP                    13m

Now that Milvus has been successfully deployed, we’re ready to start our performance testing with ANN-Benchmarks.

ANN-Benchmarks setup

Approximate nearest neighbor (ANN) algorithms are a critical component of vector search, particularly with high-dimensional data that is typical in generative AI applications. These algorithms provide a pragmatic balance between accuracy and computational efficiency, enabling the retrieval of data points that are close to a given query point without exhaustively comparing it to every other point in the dataset. Unlike exact nearest neighbor searches, which can be computationally prohibitive as the dimensionality and size of the dataset grow, ANN algorithms employ various strategies such as hashing, trees, or graph-based approaches to quickly narrow down the search space. This approach allows rapid query responses even within vast datasets, making them valuable for real-time applications that rely on similarity search, such as content recommendation, image and voice recognition, and natural language processing.

ANN-Benchmarks is a popular tool for evaluating the performance of various ANN implementations. It provides a standard framework for assessing how different algorithms and vector databases perform on common tasks and datasets. By running ANN-Benchmarks, developers can compare the query time, accuracy, and memory usage of their vector database against other similar solutions. This benchmarking is crucial because it sheds light on the scalability and efficiency of the ANN algorithms underpinning the vector database, which directly affects the responsiveness and viability of generative AI applications.

Using the data from ANN-Benchmarks, developers can make informed decisions about their generative AI applications. For instance, if the benchmarks show that a particular database offers the fastest query times with acceptable accuracy, it might be the best choice for applications in which real-time performance is most critical. On the other hand, if memory usage is a concern, a more memory-efficient database might be preferable, even if it means slightly longer query times. Ultimately, the insights gained from ANN-Benchmarks allow developers to strike the right balance among speed, accuracy, and resource consumption, ensuring that their generative AI application can operate effectively at the desired scale.

To get started with ANN-Benchmarks, SSH to your workstation VM and run the following commands:

git clone https://github.com/erikbern/ann-benchmarks.git
cd ann-benchmarks

Next, we’ll install Python 3.10, because it’s the validated version of Python for ANN-Benchmarks (these commands are for Ubuntu, and will vary across Linux distributions):

sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.10 python3.10-distutils python3.10-venv

We’ll now set up our Python virtual environment and install the necessary Python packages:

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pymilvus

Finally, we’ll update the Milvus module to point at our Milvus load balancer service. Use your favorite text editor to open the following file:

vim ann_benchmarks/algorithms/milvus/module.py

Update line 25:

                self.connects.connect("default", host='localhost', port='19530')

Replace localhost with the external IP from the Milvus load balancer service gathered in the previous section (your IP will very likely be different):

                self.connects.connect("default", host='10.10.0.9', port='19530')

When the update is complete, save and quit your text editor.

ANN-Benchmarks: GloVe 100 Angular

We’re now ready to start our performance benchmarks with the following command (this command took around 8 hours in our environment):

python run.py --algorithm milvus-hnsw --local

The algorithm argument instructs ANN-Benchmarks to run the Milvus vector database tests, with the HNSW (hierarchical navigable small world graph) vector index. This vector index is a graph-based index with very high-speed queries; however, it does require high memory resources. The local argument instructs ANN-Benchmarks to run the tests “locally” rather than using a Docker container, but because we’re pointing at our external Milvus instance, these tests aren’t truly local.

For this test, we’re omitting the dataset argument, which will result in falling back to the default, GloVe 100 Angular dataset. This dataset is a collection of 100-dimensional word vectors from the Global Vectors for Word Representation project. “Angular” refers to the use of angular distance for measuring the similarity between vectors, focusing on the direction rather than the magnitude of the vectors. This dataset helps in evaluating the performance of vector databases in terms of speed and accuracy for tasks involving semantic word searches.

The output of the command should look like this:

$ python run.py --algorithm milvus-hnsw --local
downloading https://ann-benchmarks.com/glove-100-angular.hdf5 -> data/glove-100-angular.hdf5...
2024-08-24 01:27:27,035 - annb - INFO - running only milvus-hnsw
2024-08-24 01:27:27,377 - annb - INFO - Order: [Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 16, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False), Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 48, 'efConstruction': 500}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False), Definition(algorithm='milvus-hnsw', constructor='MilvusHNSW', module='ann_benchmarks.algorithms.milvus', docker_tag='ann-benchmarks-milvus', arguments=['angular', 100, {'M': 64, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [600], [800]], disabled=False),
...

After the GloVe 100 Angular dataset is downloaded, the list of tests to be executed are printed out. These tests vary a set of arguments:

M: the maximum number of outgoing connections in the graph (higher values result in higher accuracy and run time when other values are fixed)
efConstruction: the trade-off between index search speed and build speed (higher values may increase the index quality but also will lengthen the indexing time)
Query argument groups (ef): the trade-off between query time and accuracy (higher values result in searches that are more accurate but slower)

After the tests have been running for a period of time, open the metrics explorer page of the Google Cloud console, and type netapp.googleapis.com in the Select a Metric field. Select the NetApp Volumes operations count metric, and optionally add a filter depending on the number of volumes in your environment. Ensure that the persistent volume claims created during the deployment of Milvus are being utilized.

After several more hours, our tests will be complete, and we can move on to our next dataset.

ANN-Benchmarks: Sift 128 Euclidean

We’ll now run our next set of benchmarks, this time specifying the Sift 128 Euclidean dataset. This dataset consists of 128-dimensional vectors obtained using the scale-invariant feature transform algorithm for image feature extraction. It uses the Euclidean distance metric, which is the standard “straight-line” distance in the multidimensional space. The Sift dataset is essential for evaluating image matching, object recognition, and other computer vision tasks that require quick and reliable feature comparison.

Back in your workstation, run the following command, and take note of the additional dataset argument:

python run.py --algorithm milvus-hnsw --local --dataset sift-128-euclidean

In our testing, this command took about 5 hours. When it’s complete, we can move on to the analysis of the results.

ANN-Benchmarks analysis

As detailed in the ANN-Benchmarks readme, there are several ways to analyze the results of the previous tests:

plot.py, which enables you to create individual charts, with highly customizable x and y axes (run python ploy.py --help to view all available options)
create_website.py, which generates a handful of HTML pages which include about a dozen charts
data_export.py, which enables exporting all data to a CSV file, which is useful when additional data processing is needed

Feel free to use each of these methods; however, we’ll be using the create_website script, because it’s simple but still provides a large amount of insight. From your workstation VM, run the following command:

python create_website.py

If your VM has a desktop environment, you can open the milvus-hnsw.html file. Otherwise, copy it to your physical machine by using the following command:

scp <user>@<ip>:/home/<user>/ann-benchmarks/milvus-hnsw.html milvus-hnsw.html

This HTML page contains about a dozen graphs (which are included in the appendix). Let’s dig into a couple of them here.

This chart helps visualize the performance of Milvus (using an HNSW vector index) when searching two different datasets. The axes represent recall and queries per second:

Recall is a measure of accuracy. It represents the fraction of the true nearest neighbors that the ANN search algorithm successfully retrieves. A recall of 0.9 (or 90%) means that the algorithm retrieved 9 of the 10 true nearest neighbors. Recall is crucial for understanding the quality of the search results provided by the algorithm.
Queries per second (QPS) measures the speed of the algorithm, indicating how many queries the algorithm can process in 1 second. A higher QPS value means the algorithm is faster, which is particularly important for applications that require real-time responses.

As mentioned in the chart label, trendlines to the up and right are better, meaning Milvus HNSW performed consistently better with the Sift 128 Euclidean dataset than the GloVe 100 Angular dataset. This result indicates that Milvus HNSW is better suited for computer vision tasks than natural language processing. However, we recommend testing against additional datasets and vector databases to find the best match for your specific application.

Let’s view another graph:

Instead of purely showing results from the query phase as in the previous chart, this chart shows the trade-off between the amount of time it takes to build the index (y-axis) and the recall value (x-axis). Build time is a one-time cost to create the data structure and must be incurred before any queries can be processed.

Some vector databases and their underlying algorithms have a very fast build time but yield lower recall, making them suitable for applications where the index needs to be built (or rebuilt) quickly and frequently, and where perfect accuracy isn’t critical. On the other hand, some algorithms might take longer to build their indexes but provide higher recall, making them a better choice for applications where the accuracy of search results is most critical and the index doesn’t need to be updated as often.

Conclusion

In summary, the deployment of Milvus on Google Kubernetes Engine with Google Cloud NetApp Volumes showcases a powerful, scalable solution for managing high-dimensional vector data essential to generative AI. The integration of these technologies, tested rigorously through ANN-Benchmarks, provides clear insights into optimizing performance for AI applications.

The results confirm the necessity of selecting robust infrastructure to meet the demands of vector databases. GKE and NetApp Volumes offer the scalability and flexibility required for the evolving landscape of AI, so that as data and model complexities grow, the system remains capable and efficient.

This exploration into Milvus on GKE with Google Cloud NetApp Volumes equips developers and data scientists with valuable knowledge to fine-tune their generative AI applications, ensuring that they remain at the cutting edge of AI innovation. As AI progresses, the combination of advanced vector databases and cloud-native technologies will continue to be instrumental in driving forward the next generation of AI advancements.

Optimizing vector DBs for generative AI: Deploying Milvus on GKE with Google Cloud NetApp Volumes

Prerequisites

Install Milvus

ANN-Benchmarks setup

ANN-Benchmarks: GloVe 100 Angular

ANN-Benchmarks: Sift 128 Euclidean

ANN-Benchmarks analysis

Conclusion

Appendix