Data fuels AI, and data runs on NetApp. NetApp and AWS are continuing to work together to offer a GenAI-ready storage solution for AI/ML services like Amazon Bedrock and Amazon SageMaker. Amazon FSx for NetApp ONTAP now supports dual-protocol access (NFS and S3) that can be used in the fields of AI, machine learning, and data science.
FSx for ONTAP dual-protocol access
FSx for ONTAP now supports dual-protocol access, enabling data engineers and data scientists to easily access and share the same data from both NFS and S3 without the need for additional software. Here’s an example. A team of data scientists might be working on a machine learning project which requires access to data stored in the NFS format. However, the data might also need to be accessed and shared via S3 buckets to collaborate with other team members or to integrate with other applications that use S3.
By using FSx for ONTAP, the team can store their data in a single location and have it accessible with both NFS and S3 protocols. The data scientists and data engineers can access the data in NFS format directly from SageMaker notebooks, while other team members or applications can access the same data via S3 buckets.
This approach enables the data to be accessed and shared easily and efficiently without the need for additional software or data migration between different storage solutions. It also allows for a more streamlined workflow and collaboration among team members, resulting in faster and more effective development of machine learning models.
FSx for ONTAP and Amazon Bedrock
Amazon Bedrock is a fully managed service that makes foundation models from Amazon and leading AI startups available through an API. Working with AWS, we want to empower data engineers and data scientists to harness the true value of their enterprise data to be more productive and efficient in gaining relevant business insights using GenAI. Examples of applications include augmented chatbot, enterprise search, document summarization, test-to-image, and code pilot.
Using FSx for ONTAP in a Bedrock workflow enables the enterprise data to be closer to the GenAI applications for performance and ease of management. Also, FSx for ONTAP offers data mobility (on premises to the AWS Cloud), data protection and availability, and versioning using space-efficient clones.
Below is a typical architecture for a GenAI-powered chatbot using FSx for ONTAP and Bedrock.
FSx for ONTAP and SageMaker
Amazon SageMaker is a fully managed machine learning service, offering data engineers and data scientists the capabilities to create, train, and deploy high-quality ML models efficiently.
A typical scenario is to burst from an on-premises NetApp® ONTAP® workload into Amazon SageMaker notebooks. Here are the six steps to complete the workflow.
- The data engineer deploys FSx for ONTAP from the AWS console and creates a volume as a training data destination exported via NFS.
- The training data is generated and stored in the on-premises NetApp ONTAP storage system.
- Data is replicated from the on-premises ONTAP system to FSx for ONTAP using NetApp SnapMirror®.
- The data scientist creates a Jupyter notebook using SageMaker.
- The data scientist mounts the training data destination volume on the Jupyter notebook to access the training data.
- The data scientist uses SageMaker, the fully managed ML service, to start the build, train, and deploy Large Language Model (LLM) process.
FSx for ONTAP and Apache Kafka
Kafka workloads in production applications can stream huge amounts of data between applications. Network File System (NFS) is a widely used network filesystem for storing large amounts of data. These workloads require scalability, low latency, and a robust data ingestion architecture with modern storage capabilities. To enable real-time analytics and to provide actionable insights, a well-designed and highly performant infrastructure is required.
FSx for ONTAP provides a fully managed, scalable, and highly performant NFS file system in the cloud. Kafka data on FSx for ONTAP can scale to handle large amounts of data and ensure fault tolerance. NFS provides centralized storage management and data protection for critical and sensitive datasets.
These enhancements make it possible for AWS users to take advantage of FSx for ONTAP when running Kafka workloads on AWS compute services. These benefits are:
- Reducing CPU utilization to reduce the I/O wait time
- Faster Kafka broker recovery time
- Reliability and efficiency
- Scalability and performance
- Access across multiple Availability Zones
- Data protection
VPC Peering for Single-AZ deployments on VMware Cloud on AWS
Reducing total cost of ownership as well as infrastructure complexity is top of mind for most companies. VPC peering helps in both areas, enabling connectivity of NFS datastores on Single-AZ FSx for ONTAP filesystems to ESX clusters in a software-defined data center (SDDC). This new option announced by VMware and now generally available will simplify setup, improve performance, and reduce TCO relative to the existing VMware Transit Gateway (vTGW) solution. Unlike vTGW solution, there will be no data transfer charges associated with VPC peering used to connect the VMware cluster to a Single-AZ FSx for ONTAP filesystem in the same availability zone, saving up to 25%. You can also calculate your savings potential using our VMware Cloud and FSx for ONTAP TCO tool. Visit VMware Cloud on AWS to learn more.
Key takeaways
Data scientists and data engineers often need to access common data stored in both the NFS and S3 protocols in AI/ML applications like Amazon Bedrock and Amazon SageMaker. Amazon FSx for NetApp ONTAP provides a solution by enabling dual-protocol access for NFS and S3. This approach enables easy access and sharing of the same data from both NFS and S3 without the need for additional software.
NetApp is working with AWS to create data pipelines to unlock data for GenAI. Using FSx for NetApp ONTAP, data engineers and data scientists can leverage:
- Amazon SageMaker notebooks to easily collaborate on building, training, and deploying LLM models
- Amazon Bedrock, a fully managed service that makes foundation models from Amazon and leading AI startups available through an API
- Kafka to process real-time feeds of various data sources
Only NetApp enables you to integrate, access, and manage the full lifecycle of your AI data—anywhere. For more details, visit the FSx for ONTAP overview or talk with an AWS specialist.