Tech ONTAP Blogs

Moving on from Hadoop: Modernizing Data Analytics with Dremio and StorageGRID



Dremio + StorageGRID architecture


Hadoop is a common data analytics platform still used today by many enterprises. While Hadoop is a powerful tool for processing big data, it faces several pain points in the modern enterprise. Hadoop workloads can be migrated to a Dremio and StorageGRID joint solution for a modernized and future-proof data analytics platform with improved performance, scalability, security, and simplicity.


Dremio, an open data lakehouse, offers vastly significant performance over legacy data lake SQL engines like Apache Hive and Impala. To understand how data analytic platforms perform on object storage, NetApp ran the industry standard TPC-DS benchmark with a 1TB dataset size against Hive + Hadoop and Dremio, both with StorageGRID object storage. In a test of identically sized clusters, Dremio completed the benchmark over 23 times faster than Hive + Hadoop! This performance improvement not only increases the efficiency of the data analytics platform, but also removes the need for local data copies that are difficult to track and manage. Dremio uses query-acceleration technology to achieve interactive-speed response times, and supports Columnar Cloud Caching (C3), which uses NVMe SSD technology built into cloud compute instances to achieve NVMe-level I/O performance. These tools, and more, make Dremio the world’s fastest lakehouse engine, and this performance opens the data lake to powerful BI and data analysis.


In Hadoop clusters, data nodes both run tasks and store blocks of data. This coupling of storage and compute often leads to underutilized resources. In a modern data infrastructure with StorageGRID and Dremio, compute and storage resources are independently scalable. To increase compute or storage, simply upgrade or add nodes to your Dremio or StorageGRID cluster. Both Dremio and StorageGRID support simple and low-touch expansions, making it easy to build an efficient data analytics platform with room for growth.


Hadoop faces inherent security flaws in the modern enterprise, including weak authentication protocols and the lack of native encryption support. Dremio and StorageGRID come with modern security built-in. With StorageGRID features like identity federation, grid, bucket, and object level encryption, S3 Object Lock support, and S3 Versioning support, data can be stored securely without extra time and effort. Migrating from Hadoop to StorageGRID and Dremio allows enterprises to take data security for granted.


Creating and maintaining an effective Hadoop cluster requires a Hadoop expert, and maintenance can be time consuming as the cluster grows. Both StorageGRID and Dremio can be administered through modern web-based UIs, and both tools offer powerful API support. Ease of administration and self-service data analytics save time and simplify the workload of busy IT teams.


A modern data analytics platform built on StorageGRID and Dremio provides immense advantages over a legacy Hadoop cluster. With lightning-fast query performance, independent and low-touch scaling of compute and storage, built-in and future-proof security, and ease of administration, StorageGRID and Dremio offer a modern alternative to Hadoop.


To learn more about Dremio and StorageGRID, download the solution brief and watch the webinar where NetApp ActiveIQ Technical Director Aaron Sims shares his experience building a data lake with StorageGRID and Dremio.