Run Big Data Analytics Natively On NFS Data

By Mike McNamara, Sr. Manager, Product Marketing, NetApp


While Hadoop has been used mainly on incoming, external data, there’s also been a need to use it on existing, internal data, typically stored in network-attached storage (NAS). However, using Hadoop on internal data like this has a downside. Typically, it requires setting up another storage silo to host the ) and then running the Hadoop analytics on that storage. This results in additional data management, more inefficiencies, and additional costs of moving the data between NAS and HDFS.


with the NetApp NFS Connector for Hadoop, which allows analytics software to use NetApp clustered Data ONTAP®. The connector works with Apache Hadoop and Apache Spark by using a simple configuration file change that enables data on NFSv3 storage to be analyzed. By using clustered Data ONTAP, the connector decouples analytics from storage, leveraging the benefits of NAS. For even higher performance, the NetApp NFS Connector for Hadoop can be combined with Tachyon to build a scale-out caching tier that is backed by clustered Data ONTAP.


NetApp Solutions for Hadoop and NFS Connector for Hadoop.jpg



You can employ NetApp NFS Connector for Hadoop to run big data analytics on NFSv3 data—without moving the data, creating a separate analytics silo, or setting up a Hadoop cluster. You can start analyzing existing data with Hadoop right away. You can also leverage NFS Connector to run a proof-of-concept, then set up a Hadoop cluster using NetApp Solutions for Hadoop for data from external sources. 


NFS Connector lets you swap out of HDFS for NFS or run NFS alongside HDFS. NFS Connector works with MapReduce for compute or processing and supports other Apache projects, including HBase (columnar database) and Spark (processing engine compatible with Hadoop). These capabilities let NFS Connector support diverse workloads—including batch, in-memory, streaming, and more.