CDMI Emerges as the new NFS – A Big Deal for Big Data

Do you use NFS or CIFS to access data stored in large repositories? Better watch out – there is a new kid in town!

 

Traditionally, large amounts of unstructured data (or Big Data) have been stored as files in file systems. Retrieving data meant that you needed to know the file share, the directory (and sub-directories) and have at least a rough idea what the file name and extension would be. Increasingly, this just doesn’t work anymore – today IT departments already manage content repositories that store hundreds of millions or billions of files, often across many locations. As the amount and complexity of data stored in enterprises grows, it becomes increasingly important to find a better way to store, manage and retrieve this data.

 

A key path to solve this issue is to leverage technology and standards that have been specifically developed to provide this idea of a single namespace for billions of data sets and across locations and even managed services that might reside off-premise.

 

On the technology side, NetApp just released StorageGRID 9.0 (http://www.netapp.com/us/company/news/news-rel-20120809-203086.html). StorageGRID was developed from the ground up to support large, distributed content repositories – managing billions of data sets and petabytes of capacity across hundreds of sites in a single namespace. With this technology, you know what data you have in your repository and you can control where this data is stored (locations, tiers, etc.).

 

On the standards side there is CDMI (http://www.snia.org/cdmi), the Cloud Data Management Interface. CDMI is a standard developed by SNIA (http://www.snia.org), the Storage Networking Industry Association, with heavy involvement from a number of leading storage vendors, including NetApp. CDMI not only introduces a standard to ingest and retrieve data into and out of a large-scale repository, it also enables applications to easily manage this repository and where the data sits.

 

CDMI has arrived in the real world

 

NetApp StorageGRID already supported NFS and CIFS, as well as an API on top of RESTful HTTP (http://en.wikipedia.org/wiki/Representational_state_transfer). So why is NetApp adding support for CDMI? It’s very simple – we believe that standards are important and that ultimately our customers will benefit from an ecosystem of solutions built on standards. Already a number of companies are working on supporting CDMI or have announced support for CDMI, so while still a bit early from an adoption perspective, the momentum is clearly there.

 

CDMI is the new NFS

 

When it comes to creating and managing large, distributed content repositories it quickly becomes clear that NFS and CIFS are not ideally suited for this use case. This is where CDMI shines, especially with an object-based storage architecture behind it that was built to support multi-petabyte environments with billions of data sets across hundreds of sites and accommodates retention policies that can reach to “forever”. NetApp’s Distributed Content Repository solution based on StorageGRID and E-Series storage systems fits precisely into this space.

 

 

Find out more about our Distributed Content Repository solution in the solution brief here: http://media.netapp.com/documents/ds-3339.pdf

 

Read the Big Content white paper here: http://media.netapp.com/documents/wp-7161-0512.pdf

 

Watch me talk about Big Content: http://www.youtube.com/watch?v=96g98Gb_rWE

 

What are your thoughts? Have you implemented object-based storage and want to share your experience? Go ahead and leave your comments below.

Comments

The problem with cdmi are the client-side apps. It would be realy nice if vendors contributed in eg a robust fuse cdmi linux

Nice Article. Congratulations on your CDMI release.I agree that CDMI is where the traction will be seen in coming years.

I wanted couple of clarifications:

>>CDMI is the new NFS

Typically NFS will always have a standard NFS Client and a CDMI will not. Moreover CDMI is based on REST where there will be a new https handshake for every interaction. I understand that REST has its own advantage. But does this all not make CDMI not comparable to NFS in terms of performance ? Will it be right to say that workloads for NFS will be different than workloads for CDMI , while surely some of the NFS workloads can be

taken over by CDMI (which one ?) ?

Your valued thoughts ?

Regards

Sandeep

New Contributor

Hi Sandeep,

Yes, I absolutely agree that there are use cases for NFS - my point on CDMI is specifically about large, distributed content repositories which are a great use case for CDMI. In my opinion, CDMI will see a similar adoption rate in this area as NFS did in others.

Thanks for your comment and giving me the opportunity to clarify.

Best, Ingo

New Contributor

I agree that we need more choice in the client-side apps space. This is one of the key reasons why we provide a shipping product with solid support for CDMI, so that organizations can develop against an actual, commercial CDMI server implementation.

Thanks for your comments,

Ingo

Good article and congratulation on your CDMI release.

But CDMI as a replacement for NFS, well, you may be stretching it a bit.  At least for right now.

CDMI has much more potential than standard filer protocols. How and when CDMI encroaches on traditional filer workloads will depend on the pervasiveness of client support. Most filer client protocols are kernel file systems... where many of the libraries used to create CDMI implementations are written in application space tools which are suitable for servers but not for most client implementations. The libraries required to build kernel space CDMI clients simply do not exist today. There are several other issues that make CDMI somewhat incompatible with generic filer applications, but they are not insurmountable.

CDMI adds new capabilities to data management, which places CDMI in the "game changer" class of technologies. Hammering it into traditional filer workloads could be a good start, but may not be the best fit for the technology.  

Companies providing CDMI servers will need to make investments in client side interface to CDMI in order to accelerate adoption. Unfortunately, in this business, half a solution has little to no commercial value.

cheers,

-g

Good Info.

it depends on, how many organizations are really interested in using this  technology or big data?

They also have to invest in client side applications which can be customized based on their needs to really use CDMI server rich features.

There should be healthy competition in the market to provide best CDMI solutions  so that CDMI technology will be adopted quickly by large customer base.Apart from NetApp,  who else is working on this technology?

regds

Sunil

New Contributor

Hi Sunil,

We are working with a number of ISVs that are interested in integrating CDMI on the back-end to interface with storage systems that support CDMI. While I cannot comment on the names of these ISVs at this point, I would encourage customers to reach out to their software vendors and ask that question. Purely from a formal CDMI perspective - you can find the list of member companies in the Cloud Storage Initiative at SNIA (this is the marketing and education group supporting CDMI) on their website here: http://snia.org/forums/csi

Best Regards,

Ingo