Organizations of all sizes all around the world and across all industries are using data lakes to transform data from a cost that must be managed to a valuable business asset. For many of these organizations, S3 object storage is becoming the de facto storage for their data lake because object storage offers unlimited capacity, durability, elasticity, and cost-effective ways of storing and managing data. Which is why it’s important to empower data enthusiasts with easier, cleaner, and faster ways to access their data so they can analyze it to gather business insights and make important business decisions.
With this goal in mind, the NetApp® StorageGRID® team is excited to announce StorageGRID's new S3 Select capability with our new 11.6 release. One of the challenges that S3 Select is going to solve is that now our data enthusiasts can retrieve a subset of a dataset rather than retrieving an entire object and then extracting the needed part.
In object storage platforms, data is traditionally accessed as whole entities. When you ask for a 3GB object you get the full 3GB object; that’s just how object storage works. As a result, trying to get a portion of the 3GB data is a tedious process. You must first retrieve the full object and then extract the desired portion of the data at the application level. This process can be time consuming, and to retrieve a small portion of the data you are dealing with high data transfer. But now with S3 Select you can directly extract the desired portion of the data, giving you faster access and reduced network traffic for such operations.
The StorageGRID S3 Select feature allows you to use SQL statements to filter the contents of an S3 object on StorageGRID and retrieve just the subset of data that you request. By using StorageGRID S3 Select to filter this data, you reduce the amount of S3 data that is being transferred, which reduces the latency of the entire operation and also reduces network traffic over the wire. In addition, you have now offloaded the processing done by your application to your object storage. AWS S3 lets you retrieve data via REST API, which was game changing when it was introduced. S3 Select takes it a powerful step further by sending the query via API to extract a portion of the data from your object.
Today, StorageGRID S3 Select works on objects stored in CSV format. It also works on objects that are compressed with GZIP or BZIP2 (for CSV objects only). You can specify the format of the results as CSV, and you can determine how the records in the result are delimited. StorageGRID will support more formats in future releases. Check out the full list of supported clauses, data types, and operators.
With S3 Select, you can accelerate S3 data querying by using simple SQL queries to improve performance and reduce cost. And now that StorageGRID supports S3 Select, as third-party software solutions integrate the S3 Select API into their solutions and applications, StorageGRID will be ready for them, resulting in automatically increased efficiency and performance. Check out this demo to learn how S3 Select works on StorageGRID.