Snowflake is a cloud data platform built for fast and flexible data warehousing. The Snowflake platform requires near-zero management for organizations to run analytics on their datasets. Until recently, organizations had to move their data into Snowflake storage to harness Snowflake’s unique data warehousing capabilities. However, organizations couldn’t move entire datasets into Snowflake for various reasons such as compliance, governance, or even data volume. To address this issue, Snowflake introduced External Tables to help make data that resides outside Snowflake storage available to Snowflake services.
The Snowflake External Tables capability was available only on major cloud platforms such as AWS, Google Cloud, and Microsoft Azure. But now, Snowflake has announced that it has extended External Tables to on-premises storage, so Snowflake users can bring Snowflake's useful tools to datasets on a private cloud. Today, storage platforms that are capable of talking S3 protocol can take advantage of Snowflake External Tables. Teams at NetApp and Snowflake worked closely together to help validate NetApp® StorageGRID®storage for External Tables.
StorageGRID is an enterprise-grade object storage solution that natively speaks the S3 API. StorageGRID creates a single namespace that can span 16 sites, enabling data sharing across users in multiple locations. The StorageGRID information lifecycle management (ILM) engine is a key differentiator among other object storage platforms in the market. The ILM engine enforces how and where your data resides on StorageGRID based on your organization’s data access, protection, and compliance needs.
How to enable Snowflake External Tables on StorageGRID
First, let’s upload the sample CSV files to be queried from Snowflake into an S3 bucket in StorageGRID. Here’s an image of what one of the CSV files looks like, followed by a screenshot of the S3 bucket after the sample files are uploaded.
After files are uploaded into the StorageGRID S3 bucket, let’s enable External Tables on that bucket.
Snowflake account information
S3 bucket name
S3 keys to access that bucket
Open a new worksheet in your Snowflake account and use the following code to create a warehouse, a database, and an external stage. After the external stage is successfully created, create an External Table, reference the external stage for the data location on the S3 bucket, and specify the file format.
Now you can view your data in your S3 bucket. The following screenshot shows the output after you run the worksheet.
You can also copy your External Tables data into a Snowflake table by using the copy intocommand to do in-place analytics, as shown in this example code:
To run the worksheet, add the above code to your existing worksheet to copy data into a Snowflake table. The following screenshot shows sample output.
As you can see from the example, it is very easy to configure and use this new capability from Snowflake to run queries on your on-prem data that resides on StorageGRID.
Key benefits of using StorageGRID for Snowflake External Tables
Maintain data sovereignty while harnessing Snowflake’s diverse capabilities.
Enable data sharing across users from different geo locations through StorageGRID multitenancy.
Optimize StorageGRID resources for your active and inactive datasets by using the powerful StorageGRID information lifecycle management engine.
Scale StorageGRID capacity, performance, and sites with low-touch operations.
With Snowflake extending the External Tables capability to on-premises environments, customers can now access their datasets on Snowflake storage, public cloud storage, and on-premises storage through a single pane of glass. It’s now easier than ever for users to collaborate inside and outside their organizations.