Organizations of all sizes and across many industries are turning to artificial intelligence (AI) to solve real-world problems, deliver innovative products and services, and to get an edge in an increasingly competitive marketplace. As organizations increase their use of AI, they face many challenges, including workload scalability and data availability. According to survey data, many projects have been perpetually stuck in the proof-of-concept phase due to an inability to overcome these challenges. A large ecosystem of MLOps (machine learning operations) tools and platforms has emerged to address this problem. Many of the leading tools are open-source, and unfortunately, the learning curve can be steep for traditional IT departments. We are constantly striving to help our customers implement these workflows by accelerating them with NetApp technology.
Blueprint for open-source MLOps
We have developed a solution that pairs open-source MLOps tools with intelligent data infrastructure from NetApp. This solution is intended to demonstrate how popular open-source MLOps tools can be augmented with NetApp capabilities and incorporated into an MLOps workflow. These different tools and frameworks can be used together or by themselves depending on the requirements and use case. Our solution currently covers two popular open-source MLOps tools – Kubeflow and Apache Airflow.
Kubeflow is an open-source AI toolkit for Kubernetes that was originally developed by Google. The Kubeflow project makes deployments of AI workflows on Kubernetes simple, portable, and scalable. Kubeflow abstracts away the intricacies of Kubernetes, allowing data scientists to focus on what they know best ― data science. Kubeflow is a good open-source option for organizations that prefer an all-in-one MLOps platform.
Apache Airflow is an open-source workflow management platform that enables programmatic authoring, scheduling, and monitoring for complex enterprise workflows. It is often used to automate ETL and data pipeline workflows, but it is not limited to these types of workflows. The Airflow project was started by Airbnb but has since become very popular in the industry and now falls under the auspices of The Apache Software Foundation. Airflow is written in Python, Airflow workflows are created via Python scripts, and Airflow is designed under the principle of "configuration as code.” Many enterprise Airflow users now run Airflow on top of Kubernetes.
Accelerating AI workflows with NetApp technology
We have demonstrated methods for augmenting these open-source MLOps tools with NetApp technologies to unlock the following key capabilities:
- Data scientists can perform advanced NetApp data management operations, such as creating snapshots and clones, directly from within a Jupyter Notebook.
- Advanced NetApp data management operations, such as creating snapshots and clones, can be incorporated into automated data pipeline workflows using the Kubeflow Pipelines and Apache Airflow frameworks.
Stay tuned – we will be adding additional open-source MLOps tools to this solution in the future. In the meantime, you can find the documentation for this solution here. To learn more about NetApp’s solutions for AI, visit netapp.com/ai.