The Two Faces of Object Storage

Blog - Rubens Vase.jpgObject storage has become a regular topic of discussion in IT these days. Why the sudden interest? There are two separate (but related) reasons that object storage is currently being deliberated. I’ll describe both in this blog, but first, some basics.

 

As you probably know, today’s two common methods of data storage access, file and block, have been around for many decades. These two methods access data in different ways – and object storage brings yet a 3rd way to store and retrieve data.

 

File-level storage is a high level networked communication between servers and storage devices where the storage devices contain an integrated file system to store and retrieve data. In order to read or write data, a pathname is specified. This path points to a file system location using a directory tree hierarchy, expressed as a string of characters in which path components, separated by delimiting characters, represent each subdirectory. A familiar example:

Blog - File Write Command.jpg

 

The advantage of file storage is that the server OS (or application) does not need to map the data to the storage device, that responsibility is handled by the storage controller – you only need to remember the file path. The disadvantage of file storage is that the overhead of the file system usually results in slower access times to data.

 

Block-level storage, by contrast, uses a low-level logical block addressing (LBA) scheme which converts physical storage devices i.e. disk drives, into groups of logical storage addresses. This simply means the sectors of each disk (or tape, or SSD) are sequentially numbered starting with LBA number 0. Every sector is identified by its unique LBA number. LBA’s are mapped by the server’s operating system, or sometimes by applications running on the server, and specified by SCSI commands sent to storage devices from the server. For example:

Blog - SCSI Block Write Command.jpg

 

The example above shows the structure of a simple WRITE command sent from the server. It is the responsibility of the server OS (or application) to map each device and LBA - and to keep track of the data written to storage devices. The primary advantage of block storage is the speed at which data can be stored and retrieved using this “raw” device interface. The disadvantage is that the server is required to maintain the entire map of devices and LBAs.

 

Object-level storage, like file-level storage, object storage utilizes a high-level storage architecture. Unlike file storage, however, object storage does not rely on a file system hierarchy. Instead, object storage uses unique user ID’s (UUIDs) contained in a flat namespace database that spans all storage devices in the object store, regardless of device type or location. The storage devices can be contained within a single location, but more likely are dispersed across many data centers with geographic separation. Applications communicate directly with object storage devices using a high-level programming language, such as Curl, as shown in the following example:

Blog - Object PUT Command.jpg

 

When the object storage system receives the above “PUT” command, it stores the object using one or more Unique User ID’s (UUID). The UUID information is all the application needs to know in order to retrieve the desired data. By replacing LBAs with UUIDs, object storage uses a direct-access scheme similar to block-level storage, but without any mapping overhead imposed on the server.

 

The popularity of object storage lies in the fact that it takes the best from file and block storage, while enabling new capabilities not available in any prior storage architecture. These features include things such as application-programmable metadata, a namespace that can span multiple instances of physical hardware, and built-in data management functions such as data replication and data distribution at object-level granularity.

 

So, why do people care about the new features of object storage? Two primary reasons:

 

  • Geographic dispersal. All of the hyperscale cloud providers (Amazon, Azure, etc) use object storage. With object storage, cloud providers can store objects in an infinitely large flat address space that can contain billions of files, without the complexity of file system hierarchy. Using policy engines, objects can be automatically replicated across geographic regions for added protection. As object storage trickles down from cloud service providers into enterprise IT, application developers often use object storage to create global content distribution networks for their customers and employees. If you are someone in IT who is responsible for managing multiple data centers and with a geographically-dispersed employee and customer base, you are no doubt considering object storage as a way to meet business objectives.
  • Rich metadata. A byproduct of the design of some object storage vendors is the inclusion of rich, programmable, metadata. This metadata provides contextual information about the data in the object. Block storage contains no inherent metadata, and in file-based storage systems, metadata is limited to file attributes. Metadata in object storage systems, on the other hand, can be enriched with any number of custom attributes. By classifying objects with similar characteristics, such as medical patient records, and including classifications within the object metadata, objects can be quickly grouped, retrieved, and analyzed. For this reason, object storage has traditionally been popular in media-rich analytic environments, such as healthcare and energy. Lately, new uses, appropriate to traditional IT, have been found for object metadata. For instance, data policy engines can be created - based on object metadata classification. One example of this is data classification. Data classified as high importance can be stored on high performance devices, while less critical data can be automatically moved to a less expensive storage tier.

NetApp StorageGRID Webscale

 

NetApp has been a leading object storage vendor for many years. NetApp StorageGRID Webscale, now in its 10th release, is an object storage solution for large archives, media repositories, and web data stores. It’s designed for the hybrid cloud and supports standard protocols, including Amazon S3 and SNIA’s CDMI—so that object applications can be run either on premises or in the cloud.

 

The StorageGRID Webscale policy engine provides automated data placement according to site-based performance and availability requirements, optimized for cost as data ages. Real-time auditing provides continuous and active monitoring for SLA verification and reporting. The StorageGRID Webscale data durability framework ensures data integrity and accessibility.

 

StorageGRID Webscale is relied upon to store and manage large-scale, distributed repositories of images, video, and records—around the clock and across the globe.

 

Resources:

StorageGRI​D Webscale: Nonstop Object Storage for Enterprise and Cloud

Accessing StorageGRID Webscale through its S3 API