Organizations are continuously seeking ways to extract more value from their data while minimizing operational friction. Advanced analytics is one way that can happen.
Amazon Athena, a serverless interactive query service, has long enabled teams to analyze data stored in Amazon Simple Storage Service (Amazon S3) using standard SQL. But it’s not always easy to make file data stored in on-premises systems or outside of AWS accessible to Athena without carefully planned relocations to Amazon S3.
Now it’s possible to run Athena SQL queries directly on data stored in Amazon FSx for NetApp ONTAP (FSx for ONTAP) file systems, and a new world of possibilities is opening up for solution architects, data engineers, and IT leaders. This direct data access streamlines analytics, enhances data governance, and solves the long-standing challenges of data migration.
Read on as we cover:
- The challenges of migrating data to Amazon S3
- Direct SQL analytics with Athena on FSx for ONTAP
- The benefits of the FSx for ONTAP Athena integration
- Get your data to Athena without a migration
The challenges of migrating data to Amazon S3
In typical analytics workflows, preparing data for analysis with Athena would often mean moving datasets from enterprise storage systems into data lakes stored on Amazon S3. However, this data movement process could be time-consuming and require extra attention:
- Extract-transform-load (ETL) jobs: Data engineers must build, maintain, and monitor ETL pipelines, which require specialized skills and tools.
- Operational overhead: Data migration introduces latency, a risk of data inconsistency, and potential compliance issues due to storing data outside of on-premises data centers or specific regions.
- Resource costs: Duplicating large datasets consumes additional compute, network, and storage resources, driving up costs and slowing down project timelines.
These challenges can stall business insights and create friction between data producers and consumers.
Direct SQL analytics with Athena on FSx for ONTAP
With the new Amazon S3 Access Points for FSx for ONTAP capability, Amazon S3-based services can directly access file data stored in FSx for ONTAP using the Amazon S3 API. That means Athena can query data in your FSx for ONTAP volumes without migrating or duplicating data to Amazon S3. Data remains within the FSx for ONTAP file system, and users access it using familiar SQL syntax, just as they would with Amazon S3-based datasets.
This direct approach means analytics teams can easily run queries, generate reports, and build dashboards using the latest data, all while maintaining a single source of truth (SSoT).
Choosing FSx for ONTAP as the storage target for Athena delivers several advantages:
- Multiprotocol access: FSx for ONTAP supports Network File System (NFS), Server Message Block (SMB), and now Amazon S3 protocols. This means the same data can be accessed by analytics tools, legacy applications, and AWS-native services without duplicating files or managing multiple silos. Learn more about the FSx for ONTAP multiprotocol capability.
- Non-disruptive operation: FSx for ONTAP lets users query source data directly, eliminating the need to transfer or manipulate data for use on Amazon S3, reducing duplication and time delays. Athena can run live data, providing up-to-date insights without impacting your production applications.
- Extending on-premises deployments: With the help of NetApp® SnapMirror® data replication, on-premises ONTAP® users can easily replicate data to FSx for ONTAP for use with Athena. There’s no need for the ETL pipeline to get the data to Amazon S3, and the dataset is kept in sync seamlessly.
- Advanced data management: Features such as point-in-time NetApp Snapshot™ copies and rapid cloning allow teams to create safe, instant backups and DevTest environments, accelerating data-driven development and recovery processes. Learn more about the FSx for ONTAP cloning capability.
- Data protection: FSx for ONTAP provides robust data protection through point-in-time Snapshot copies, and Autonomous Ransomware Protection (ARP), which automatically detects and mitigates ransomware threats. Business continuity is further enhanced by built-in data replication, enabling seamless backup and recovery across sites and AWS Regions, along with FSx for ONTAP cyber vault capabilities that secure critical backups in isolated environments for added resilience.
- Storage efficiency features: FSx for ONTAP offers built-in thin provisioning, data deduplication, compression, and compaction, as well as data tiering, all of which significantly reduce your storage footprint and its associated costs. Learn more about the FSx for ONTAP cost efficiency.
Deduplication also benefits performance when running analytics workloads on large volumes that contain similar or repetitive data.
The benefits of the FSx for ONTAP Athena integration
For cloud architects, data engineers, and IT leaders, the integration of Athena with FSx for ONTAP translates into real-world value:
- Flexibility: Multiprotocol access enables seamless collaboration between diverse teams and applications.
- Seamless access to on-premises data: With SnapMirror replicating data from ONTAP systems to FSx for ONTAP, Athena has seamless access to your datasets. No migrations required.
- Cost savings: Eliminating the need for data migration, the FSx for ONTAP storage efficiency features mean lower infrastructure bills and less time spent on operational tasks.
- Simplified workflows: With data available for both operational and analytical workloads in place, teams can move faster from raw data to actionable insights.
- Improved data governance: Maintaining a single, consistent copy of data simplifies access control, auditing, and meeting compliance goals. These are critical requirements for organizations in highly regulated sectors, such as the healthcare and financial industries.
Get your data to Athena without a migration
Running Athena queries directly on data stored in FSx for ONTAP is a gamechanger for organizations striving to modernize their analytics capabilities without the pain of migrating data to Amazon S3.
This new accessibility brings together the best of both worlds: Instant, serverless SQL analysis and enterprise-grade storage with multiprotocol access, advanced data management, and storage efficiency.
By adopting Athena with FSx for ONTAP, cloud architects, data engineers, and IT leaders can accelerate insights, reduce costs, and strengthen data governance empowering their teams to realize the full potential of their data, faster and more securely than ever before.
Learn more in: