Tech ONTAP Blogs

Agentic AI in action: automated cloud bursting when GPU capacity is reached

moglesby
NetApp
553 Views

The rise of agentic AI is unlocking new automation use cases. AI agents powered by LLMs can handle fuzzy inputs and perform sophisticated reasoning in order to automate workflows that would have previously required the writing of complex code. However, these AI agents are useless without tools. Tools enable AI agents to perform actions in the digital world. As the popular agentic AI frameworks are all Python-based, most agentic AI tools currently take the form of Python functions. In order for agentic AI to be adopted in the enterprise, it is critical that these tools be reliable and secure.

 

NetApp-powered agentic AI tools

 

NetApp's client libraries and APIs provide a set of enterprise-grade, secure tools that AI agents can utilize. As agentic AI rapidly increases in popularity, we set out to validate the use of these NetApp-powered tools as part of agentic AI systems that automate important data management workflows.

 

AI cloud bursting

 

We were particularly excited to automate an AI cloud bursting workflow using agentic AI and NetApp tools. As organizations of all shapes and sizes race to realize the promise of AI, AI factory environments are rapidly becoming compute constrained. AI workloads often have to wait in a queue until GPU resources are available. Sometimes, these workloads have to wait for days, weeks. or even months. This logjam can significantly delay the realization of business value from AI.  Therefore, many organizations are interested in being able to temporarily "burst" AI workloads to the cloud in order to take advantage of available GPUs and accelerate AI initiatives.

 

Screenshot 2025-03-14 at 4.49.44 PM.png

 

However, AI workloads require data, and while the bursting of the jobs themselves may be relatively simple, the bursting of the associated data is not. Copying data to the cloud is a complicated and cumbersome process, and the creation of a new data copy in a different environment poses a significant governance concern. Because of this, many organizations have given up on cloud bursting altogether, and their AI workloads are languishing in a large queue.

 

Screenshot 2025-03-14 at 4.49.55 PM.png

 

Fortunately, NetApp FlexCache technology alleviates these concerns. With FlexCache, you can create a cache of a NetApp ONTAP volume in a remote environment. For example, if you have an ONTAP volume in your on-premises datacenter and you need to access data from that volume in AWS, you can create a cache of the volume in AWS using Amazon FSx for NetApp ONTAP. With FlexCache, there is no remote copy to manage; instead, the source data is accessed through the cache. This is a highly efficient process - data blocks are cached in the remote environment as they are accessed, so you never move any data blocks that you don't actually need. Additionally, all writes  go back to the original source. When you no longer need the data in the remote environment, you can simply delete the FlexCache and be assured that you have no untracked data copies hanging around anywhere.

 

Automated cloud bursting with agentic AI

 

Screenshot 2025-03-14 at 5.06.33 PM.png

 

We successfully demonstrated the automation of a cloud bursting workflow using the NetApp ONTAP Python client library,  NVIDIA NIM for LLMs and CrewAI, an open-source agentic AI framework. We defined the following components to be included in our agentic AI system:

 

  • LLM: Meta Llama 3 70B (deployed using NVIDIA NIM for LLMs)
  • Tools:
    • "Check GPU utilization tool" - Python function for checking current GPU utilization.
    • "Evaluate GPU count tool" - Python function for evaluating the current GPU utilization.
    • "Create FlexCache tool" - Python function for creating a FlexCache for cloud bursting (utilizes the NetApp ONTAP Python client library).
  • Agents:
    • "Compute Admin" - Agent that checks compute utilization.
    • "Workload Bursting Admin" - Agent that burst workloads to the cloud.
  • Tasks:
    • "Check GPU utilization" - Task for checking GPU utilization.
    • "Cache AI workspace" - Task for creating a cache of the AI workspace in the cloud using FlexCache.

 

To see our agentic AI proof of concept in action, check out our demo:

 

 

Learn more

 

Stay tuned as we continue to explore this emerging agentic AI paradigm. To learn more about the NetApp ONTAP Python client library, click here. To learn more about NetApp's AI solutions, visit netapp.com/ai.

Public