Storage, Data Management, and the Hybrid Cloud

By Peter Corbett, Vice President & Chief Architect at NetApp


The concept of flexible computing in the cloud needs no introduction here. Many businesses are being built from the ground up on cloud infrastructure, both for production IT and internal IT operations.  But the vast majority of large enterprises and organizations still run much of their IT operations internally.   I doubt that any of these organizations haven’t given some thought to how they could leverage the cloud for at least a portion of their IT needs.  But, an IT shop looking at moving some applications to the cloud faces several challenges.  Among these are several directly related to data storage and transfer.


Of course cloud providers offer storage to go along with the flexible compute capabilities they prominently feature.  They have developed pricing models that account for ingest, access and retention of data.  They offer different classes of service in terms of durability (the certainty that the data you deposit will not be lost), availability (the ability to access data at any time), performance (the latency and throughput at which data can be stored or retrieved), the storage access protocols and semantics supported, and other service level attributes.  Organizations with internal IT footprints face the same set of decisions about service levels, but these attributes historically have not been uniformly articulated and quantified across different storage system manufacturers. It is clear that the cloud providers are driven to optimize and refine their pay-as-you-go pricing models, and this has led to a more defined articulation of the service levels provided by their different classes of service.


There are really three distinct paradigms for using storage in the cloud. One is that the cloud storage is being used by a cloud-resident application (or multiple applications that run on the same cloud-resident middleware and dataset).  Another is that the cloud storage is used in a simple way as an extension to or tier of an on-premise data store, without any active agency in the cloud – in this case the storage is used directly via external interfaces. A third model, and what I think is the most interesting model, is where the data stored in the cloud is an asynchronous replica of data stored on-premise (the opposite can also be true, and is also interesting), but where the replica is directly accessible and usable by cloud based applications and agents.  A variant of the hybrid model leverages cloud compute with co-located private storage, e.g. NetApp Private Storage.


In this model, we really get tremendous flexibility.  The on-premise data can be the primary store for mission and business critical applications.  The cloud store can be used for disaster recovery, backup, archive, analytics, test, and many other uses.  These have two characteristics that make them very suitable for the cloud: they require a lot of compute horsepower sometimes but not always, and they can work off a recent point-in-time image of the data which may be slightly behind the on-premise version.  For applications and use cases that have these two characteristics, the cloud can offer compelling benefits.


To really achieve full portability of data to and from the cloud, there are three areas that will be foci for innovation:

  1. A quantitative, normalized description of storage service levels that can be compared across the spectrum of cloud vendors and on-premise and co-located storage systems.
  2. A means of evaluating the entire cost of storage and the cost of data movement to select the placement that optimizes cost for the value delivered.
  3. An efficient mechanism for moving point-in-time images to and from the cloud, including both bulk and granular transfers. Here, efficient means moving less data with less impact on the on-premise systems and lower compute and storage costs in the cloud.  The more efficient the image transfer, the more economical it will be to leverage the capabilities of the cloud. 


The cloud is a transformation of IT that will continue to impact the way things are done on-premise.  On-premise data centers will not disappear, at least not any time soon.  They will adopt the same technologies as are used in the cloud, both to increase internal efficiencies to match the efficiencies of the cloud, and to enable better participation in hybrid on-cloud/cloud co-location and on-premise infrastructure deployments.  It’s going to be interesting to see how this all plays out, but there’s no doubt that the cloud will continue to play a large and growing role in IT in the coming years.  Data management is one area where the rapid evolution of the cloud in conjunction with a large continuing on-premise IT footprint presents some of the most interesting technical challenges we face in storage today.


Great insights Peter. I have seen this in the context of mid to large enterprises that have significant in-house IT presence. Their critical data assets reside internally. For them, it behooves us (as vendors) to empower their IT to leverage external cloud as an extension of internal IT rather than an alternative to internal IT. NetApp data federation technology integrated with multi-cloud application deployment technology from vendors (like ITapp) can help them in doing so.