Tech ONTAP Blogs
Tech ONTAP Blogs
The enterprise AI landscape is evolving rapidly. Organizations are moving beyond pre-built AI models, investing instead in training and fine-tuning their own data. This shift brings tremendous opportunity, but also introduces new challenges around data infrastructure performance, management, and security. To meet these demands, FlexPod AI introduces a new model training solution built on the Cisco UCS C885A server, and delivering the comprehensive security framework essential for enterprise AI deployments.
Pre-trained AI models are a great starting point, but they only get you so far without customization options that reflect an organization’s unique terminology, processes, and domain expertise. To unlock real business value, organizations need the ability to train custom models on proprietary data and fine-tune foundation models.
Training in your own data center also addresses critical concerns around data sovereignty and security. For organizations in regulated industries like healthcare, financial services, and government, keeping sensitive training data on premises is often a requirement rather than a choice. And for any organization looking to iterate quickly, local training eliminates cloud egress costs and latency while accelerating time-to-insight.
FlexPod AI introduces a new model training solution designed for large-scale, GPU-accelerated workloads. This validated, turnkey platform, FlexPod AI for Model Training, brings together best-in-class infrastructure components to enable faster, more secure iteration across AI training workflows.
The FlexPod AI for model training architecture is designed to deliver several key benefits for enterprises running production-scale AI training workloads.
Together, these tightly integrated components form a platform that translates architectural design into measurable business outcomes for AI training teams.
While model training represents a growing opportunity, it is only one part of a broader set of AI workflows enterprises need to support. The new model training solution joins a comprehensive portfolio of FlexPod AI solutions that address the full spectrum of enterprise AI requirements:
|
Solution |
Primary Use Case |
|
LLM/SLM training, fine-tuning, HPC |
|
|
Production GenAI deployment |
|
|
Retrieval-Augmented Generation with NVIDIA NIM |
|
|
End-to-end ML lifecycle management |
|
|
General GPU-accelerated AI/ML and scaling guidance |
|
|
Suse Rancher Kubernetes-based AI/ML with multi-cluster management |
Whether you are starting with inferencing, building RAG pipelines, or training custom models, FlexPod AI provides a consistent, validated infrastructure foundation that grows with your AI initiatives.
As AI adoption accelerates, new security challenges emerge, making security an integral component to any AI deployment. AI systems often have access to sensitive data, and trained models themselves can represent valuable intellectual property. Rapid deployments may bypass security best practices, resulting in shadow IT implementations, ungoverned data pipelines, and ad-hoc infrastructure deployments.
This creates meaningful risk and increases exposure to data breaches, model theft, and compliance violations. Without the appropriate security controls, organizations increase their exposure to data breaches, model theft, and compliance violations.
Tackling risks begins with designing security within the infrastructure itself, rather than adding it later. You can’t bolt security on after the fact and expect it to be effective. As a foundational component of your infrastructure, security must be built-in from the ground up.
FlexPod delivers this secure foundation through the combined capabilities of its core components. Cisco UCS and Nexus provide hardware root of trust, secure boot, and encrypted communications. And with NetApp ONTAP, the most secure storage on the planet, you can count on secure boot, hardware root of trust, encryption for data in flight and at rest, immutable snapshots, ransomware detection, rapid recovery, and more to keep your foundation secure.
This foundation matters for AI workloads because training data, model weights, and inference results all need to operate securely. When your infrastructure is secure by design, you can focus on building AI solutions rather than worrying about whether your data is protected.
Building on the secure foundation, FlexPod provides a comprehensive set of security solutions that give you definitive guidance on hardening your environment and protecting against threats.
These solutions work together to create a defense-in-depth approach that protects your infrastructure, your data, and your AI workloads.
While a secure foundation and comprehensive infrastructure hardening are essential, AI workloads also have unique security requirements that demand purpose-built solutions. This is where Cisco Secure AI Factory with FlexPod comes in.
As discussed in a recent blog post, Secure AI Factory with FlexPod AI, this solution addresses the specific security challenges of AI deployments. It provides visibility into AI workloads, protects AI data pipelines, and helps ensure that AI systems are deployed in compliance with organizational policies. Cisco Secure AI Factory is a reference architecture built on the strong foundations already in place with FlexPod AI solutions, extending those capabilities to meet the unique demands of AI security.
Beyond infrastructure security, managing and protecting AI data at scale requires intelligent tooling. NetApp is delivering new capabilities that expand what your data can offer:
These capabilities complement the Cisco Secure AI Factory architecture by providing the high-performance storage and data management intelligence that AI workloads require. Brought together, they create a comprehensive approach to AI security that addresses infrastructure, network, and data protection.
Cisco Secure AI Factory integrates with the FlexPod security foundation and hardening guidance to deliver a complete security solution for enterprise AI. You get the performance and scalability of FlexPod AI combined with the security controls that modern AI deployments require.
Enterprise AI is moving fast, and organizations need infrastructure that can keep pace. FlexPod AI for Model Training with the Cisco C885A server delivers the performance required for the most demanding training workloads, while the broader FlexPod AI portfolio addresses use cases from inferencing to RAG pipelines to MLOps.
Performance alone is not enough. AI systems must be built on a secure foundation, hardened against threats, and protected by purpose-built security solutions. FlexPod provides all of this through its secure-by-design architecture, comprehensive hardening guides, Zero Trust framework, and Cisco Secure AI Factory integration.
This comprehensive approach to enterprise AI is only possible through the strong partnership between NetApp, Cisco, and NVIDIA. Each partner brings critical capabilities to the table: NVIDIA delivers the GPU compute and AI software stack, Cisco provides the compute and networking infrastructure along with security expertise, and NetApp contributes high-performance storage and intelligent data management. Together, these partners have built and validated a complete platform that addresses the full AI lifecycle from training to inferencing, all on a secure foundation.
Whether you are just starting your AI journey or scaling existing initiatives, FlexPod AI gives you the performance, flexibility, and security you need to succeed. And, for the most extreme AI and HPC environments where sustained parallel I/O and scratch performance dominate, NetApp also offers purpose-built EF-Series solutions paired with parallel file systems such as Lustre.
Explore the FlexPod AI for Model Training (NEW) deployment guide to configuration and implementation details.
To learn more about FlexPod, FlexPod AI for Model Training, and the complete FlexPod AI portfolio, contact your Cisco or NetApp account team.