CDMI Emerges as the new NFS – A Big Deal for Big Data

NetApp announced (http://www.netapp.com/us/company/news/news-rel-20120809-203086.html) support for SNIA’s CDMI standard in its Distributed Content Repository Solution, based on StorageGRID and E-Series Storage. Why is this such a big deal?

 

A significant challenge in managing large amounts of data (or Big Data) is a lack of what I like to call “total data awareness”. It’s a situation where you know (or suspect) that you have data - you just can’t find it. When you think about many current IT environments, they are often not built for total data awareness. This starts with core elements of the IT infrastructure, such as file systems. Traditional file systems and access methods were not designed to store hundreds of millions or billions of files in a single namespace. This leads to admins storing data in multiple file systems, multiple shares, complex directory structures – not because the data should be logically organized in that way, but simply because of limitations in file system architectures. This issue becomes even more pressing when data sits in multiple locations, maybe even across on-premise and off-premise, cloud-based storage.

 

Is object-based storage the answer?

 

Think about how you find data on your computer. Do you navigate complex directory structures, trying to remember the file name of the file that hopefully has the data you are looking for – or have you moved on and just use search tools like Spotlight? Imagine you have hundreds of millions of files, scattered across dozens or hundreds of sites. How about just searching across these sites and immediately finding the data you are looking for? With object storage technology you have the ability to store data in objects, along with metadata that describes the object. Now you can just search for your data based on metadata tags (like a filename - or even better an account number and document type) – as well as manage data based on policies that leverage that metadata.

 

However, this often means that you have to consider interfacing with your storage system through APIs, as opposed to NFS and CIFS – so your applications need to support whatever API your storage vendor offers.

 

CDMI to the rescue?

 

Today, storage vendors often use proprietary APIs. This means that application vendors would have to support a plethora of APIs from a number of different vendors, leading to a lack of commitment from application vendors to support more innovative, object-based storage architectures.

 

A key path to solve this issue is to leverage technology and standards that have been specifically developed to provide this idea of a single namespace for billions of data sets and across locations and even managed services that might reside off-premise.

 

On the technology side, NetApp just released StorageGRID 9.0. StorageGRID was developed from the ground up to support large, distributed content repositories – managing billions of data sets and petabytes of capacity across hundreds of sites in a single namespace. With this technology, you know what data you have in your repository and you can control where this data is stored (locations, tiers, etc.).

 

On the standards side there is CDMI (http://www.snia.org/cdmi), the Cloud Data Management Interface. CDMI is a standard developed by SNIA (http://www.snia.org), the Storage Networking Industry Association, with heavy involvement from a number of leading storage vendors, including NetApp. CDMI not only introduces a standard to ingest and retrieve data into and out of a large-scale repository, it also enables applications to easily manage this repository and where the data sits.

 

CDMI has arrived in the real world

 

NetApp StorageGRID already supported NFS and CIFS, as well as an API on top of RESTful HTTP (http://en.wikipedia.org/wiki/Representational_state_transfer). So why is NetApp adding support for CDMI? It’s very simple – we believe that standards are important and that ultimately our customers will benefit from an ecosystem of solutions built on standards. Already a number of companies are working on supporting CDMI or have announced support for CDMI, so while still a bit early from an adoption perspective, the momentum is clearly there.

 

CDMI is the new NFS

 

When it comes to creating and managing large, distributed content repositories it quickly becomes clear that NFS and CIFS are not ideally suited for this use case. This is where CDMI shines, especially with an object-based storage architecture behind it that was built to support multi-petabyte environments with billions of data sets across hundreds of sites and accommodates retention policies that can reach to “forever”. NetApp’s Distributed Content Repository solution based on StorageGRID and E-Series storage systems fits precisely into this space.

 

Find out more about our Distributed Content Repository solution in the solution brief here: http://media.netapp.com/documents/ds-3339.pdf

 

Read the Big Content white paper here: http://media.netapp.com/documents/wp-7161-0512.pdf

 

Watch me talk about Big Content: http://www.youtube.com/watch?v=96g98Gb_rWE

 

What are your thoughts? Have you implemented object-based storage and want to share your experience? Go ahead and leave your comments below.