Object Storage in Media

No one (least of all me) has to tell you that digital content growth is exploding.  A major film company told me a few weeks ago that a full 4K film project can create upwards of two petabytes of content.  Our sports customers generate amazing amounts of data from the dozens of full camera feeds across a stadium during a game, not to mention the consumer generated content they are starting to accept.  Since we have been freed from the costs of film and tape, we believe that the more we shoot and keep, the better the final product will be.  That may be true (or not), but we do know that the cost of holding on to this growing data is starting to really give the industry pain.

People have begun to look for less expensive ways to store all this content than the media storage they’ve been tied to for years.  Not that their current systems don’t work; they do, they just become difficult to manage as they get larger and rely on multiple applications to keep track of where everything is stored. 

One of the ideas being discussed quite often today is object storage.  Object storage combines the content data with metadata about the content and also gives it a unique identifier.  You then use the metadata to find the content data when you need it without having to remember where in the file hierarchy you stored it.  With object storage, you have no file system hierarchy to monitor and manage.  It’s all done for you. 

In some ways, object storage is similar to NFS file storage, as in the way the file name drives the metadata that tells the system on which blocks and drives the data has been stored.  The object storage “pool” is a flat address space and the unique identifier drives the metadata that knows which blocks, disks and even geos has your data. 

But if you want to find a file, you need to know more about it than simply the unique identifier.  You need rich metadata to describe the object and allow you to search across the content bucket to find what you need.  And many of us have been using this system for years; it’s called the MAM system. 

MAM systems are object based, but are not object storage.  The MAM applications allow users to access and search content by metadata and keeps track of where the file was stored by name under the covers.  Then the file system keeps track of which blocks on which disks contain the named file.

Metadata and Workflow

One of the keys to success in building any content repository is the ability to find stuff after it has been stored.  Not to mention the ability to delete stuff when it is no longer of value, or  to age it off to cheaper storage when it is of less value or not likely to be accessed often. Without a well-reasoned metadata schema, you may not be able to easily get to the content you need and want.

MAM systems were the answer for many, although just as many have told me that they can’t find the right MAM system to solve their problems.   My answer to those people has always been that MAM success depends on how well it supports ever-changing workflow and other content processes as well as by how completely the metadata schema fits your needs.  The other bugaboo of metadata is that, regardless of how well designed the schema is, if you don’t populate it well and completely, your ability to find the content you are after will be compromised.

One of my old media clients was collecting video from both professional and amateur sources to build a repository of marketing assets.  Their goal was to be able to ask the system for certain parameters and to get back every shot that might give them what they wanted.  For example, they may want every shot that include a man between 18 and 25 wearing a red swimsuit and holding or consuming their product.  That required a very rich metadata schema that could be searched to find the right objects or files to pull from the repository.

The metadata schemas for many object storage offerings are not that rich.  If you want to do that kind of search, you need to build a separate metadata management system and implement it so that it stays synced with the repository.  This kind of system would allow you to do rapid search across the metadata database alone without actually making any inquiries to the object storage system.

While some object storage solutions do offer you the ability to have a rich metadata schema to help you find the content you need, they don’t have the ability to automate media workflow processes which can speed your shows to market or reduce your costs of data management.  Remember that object storage is not a replacement for MAM systems.  That being said, to get your MAM systems to work with object storage is no small feat.  Those MAM solutions will have to address the object storage through a system of APIs that may be unique to each different object storage solution.

Many have told me that the biggest advantage of object storage for them is that it’s cheap.  Maybe it is and maybe it isn’t.  If the system uses open source software, that’s one less cost, but it adds the cost of keeping it ready to support your business.  To date, there really isn’t an open source object storage system that has all of the features you will probably need.  Are you ready to become a software development shop?

Many will also tell you that object storage works with commodity disk.  Again, is that something you are ready to support?  If you are not the size of a Google, Yahoo or Apple, it might be a real cost burden to you to maintain your own storage solutions.

In my next blog, I’ll continue to explore object storage’s near-twin brother, cloud storage.  For now, I’d like to hear what you think.  Use the section below to add your input to our conversation. 

Thanks!

Comments

Good Blog! Can I live without object storage?

madaniel Former NetApp Employee

Certainly!  Remember that you need to identify the right storage solution for the right applications.  Object storage is great for some things, but not for others.  In the next blog, we'll go over how to figure out how to evaluate object storage in your environment.

MD