As more and more organizations choose to train or fine-tune their own AI models, the field of MLOps is becoming increasingly important. MLOps is a set of concepts and best practices that aim to streamline the development of AI models and the deployment of AI models in production. An ecosystem of tools and platforms has emerged alongside the MLOps field. Many of the leading MLOps tools are open-source. While these tools offer cutting-edge AI and data science features, they often lack enterprise-scale data management capabilities. Pairing these tools with an intelligent data infrastructure from NetApp can help to address this gap.
Open-source MLOps with NetApp data management
To this end, we have developed a solution that demonstrates leading open-source MLOps tools in a NetApp-based environment. A typical MLOps workflow incorporates development workspaces, usually taking the form of Jupyter Notebooks; experiment tracking; automated training pipelines; data pipelines; and inference/deployment. Our solution highlights several different tools and frameworks that can be used independently or in conjunction to address the different aspects of the workflow. We also demonstrate the integration of NetApp data management capabilities into each of these tools. This solution is intended to offer building blocks from which an organization can construct a customized MLOps workflow that is specific to their uses cases and requirements. As of the time of this writing, this solution covers four popular open-source MLOps tools - Apache Airflow, JupyterHub, Kubeflow, and MLflow.
Apache Airflow
Apache Airflow is an open-source workflow management platform that enables programmatic authoring, scheduling, and monitoring for complex enterprise workflows. It is often used to automate ETL and data pipeline workflows, but it is not limited to these types of workflows. Airflow workflows are created via Python scripts, and Airflow is designed under the principle of "configuration as code.”
As part of our solution, we demonstrate the usage of the NetApp DataOps Toolkit for Kubernetes in conjunction with Airflow. Our solution enables end users to incorporate NetApp data management operations, such as creating snapshots and clones, into automated workflows that are orchestrated by Airflow. Refer to the Airflow Examples section within the NetApp DataOps Toolkit GitHub repository for details on using the toolkit with Airflow.
JupyterHub
JupyterHub is a multi-user application that enables individual users to provision and access their own Jupyter Notebook server. Jupyter Notebooks are wiki-like documents that contain live code as well as descriptive text. Jupyter Notebooks are widely used in the AI and data science communities as a means of documenting, storing, and sharing AI and data science projects.
As part of our solution, we demonstrate the usage of the NetApp DataOps Toolkit for Kubernetes in conjunction with JupyterHub. Our solution enables end users, such as data scientists and AI developers, to create volume snapshots for workspace backup and/or dataset-to-model traceability directly from within their Jupyter Notebook. For more details, refer to the JupyterHub section within the solution documentation.
Kubeflow
Kubelfow is a Kubernetes-based AI development and deployment platform that includes many different components. Therefore, Kubeflow is a popular open-source option for organizations that prefer an all-in-one MLOps platform. Two of the most widely-used Kubeflow components are Kubeflow Pipelines, a framework for defining and executing AI and data pipeline workflows, and Kubeflow Notebooks, a component that simplifies the provisioning and deployment of Jupyter Notebook Servers on Kubernetes.
As part of our solution, we demonstrate the usage of the NetApp DataOps Toolkit for Kubernetes in conjunction with Kubeflow Pipelines and Kubeflow Notebooks. We show how end users, such as data scientists and AI developers, can create volume snapshots for workspace backup and/or dataset-to-model traceability directly from within their Jupyter Notebook within Kubeflow. We also show how advanced NetApp data management operations, such as creating volume snapshots and clones, can be incorporated into automated workflows using the Kubeflow Pipelines framework. For more details, refer to the Kubeflow section within the solution documentation.
MLflow
MLflow is a popular open-source AI lifecycle management platform. Key features of MLflow include AI training experiment tracking and an AI model repository. Experiment management is a key best practice within the MLOps field. Because of this, MLflow has seen widespread adoption as AI has increased in popularity.
As part of our solution, we demonstrate the usage of the NetApp DataOps Toolkit for Kubernetes in conjunction with MLflow in order to implement dataset-to-model or workspace-to-model traceability within the MLflow experiment tracking platform. For more details, refer to the MLflow section of the solution documentation.
What's next?
Stay tuned – we are continually expanding the scope of our solution as the MLOps ecosystem grows and evolves. In the meantime, you can find the documentation for this solution here. To learn more about NetApp’s solutions for AI, visit netapp.com/ai.