Evolving into an Agile Data Infrastructure

NetApp’s announcement this week of the agile data infrastructure ushers in a new era of data storage scalability, management, and intelligence.  Culminating 20 years of development; this announcement leverages many of the technologies that have been a big part of NetApp’s success over the past two decades, and also brings several new technologies to the forefront.

 

Why is an agile data infrastructure important?  Two reasons – monumental data growth and the opportunities that this data brings to businesses.  Data, of course, has been growing steadily for the past 50 years, but the orders of magnitude today are simply staggering.  The creation, transmittal, processing, and storage of data have reached epic proportions.  In simple terms (assuming a 50% annual growth rate), this equates to a 58x data growth factor within this decade.  The new abundance of data has fueled an even greater thirst by users demanding more granularity and increased regularity of data flow.

 

It is becoming clear that we are reaching a collective inflection point where data growth either becomes an overwhelming burden to IT; or becomes a fuel to propel business innovation.  Successfully moving beyond the inflection point requires a new way of thinking, and a new data infrastructure that supports historic growth levels while containing costs and avoiding complexity.

 

NetApp’s agile data infrastructure is defined by 3 overall categories - Intelligent, Immortal, and Infinite. Within these three categories are nine specific technologies that together create the underpinnings of data agility.  For detailed information on the categories and technologies, keep an eye on my blog “Ask Dr Dedupe” as over the next few months I’ll be providing deep dives into each.

For the purpose of this blog, however, we’ll take a higher level view.  Beyond the specific technologies; Data ONTAP 8 and OnCommand 5 are the flagships of NetApp’s agile data infrastructure.  This single-platform approach is critical to attaining agility for a several reasons.  First, with a single architecture embedded within a group of storage arrays, all the arrays speak a common language and communicate seamlessly, without translation.  The storage arrays within the agile infrastructure have exactly the same set of capabilities, and can execute these capabilities in unison.  This is simply not possible in an infrastructure with disparate storage arrays using different architectures with dissimilar capabilities.

 

Next, a single architecture means that administrators can “learn it once” as their data infrastructures grow from terabytes to petabytes – resulting in operational simplicity.  Storage administrators become agile themselves as their learned activities allow them to become adept at managing ever-larger storage pools.  With agility, monumental data growth is not destined to lead to monumental complexity.

 

Finally, a single architecture provides a standardized protocol between the physical storage layer and the data management layer of the infrastructure – simplifying the design of data management API’s.  This leads to an application/storage ecosystem that is efficient, and more importantly - adaptable.  Diverse workloads with varying performance and capacity requirements can co-exist within the same infrastructure.

It should be noted that data management and automation are crucial in this agile environment.  Requiring humans to perform repetitive, mundane, tasks such as provisioning, load balancing, and protection is a barrier to agility.  Instead, a policy creation and enforcement system that is completely automated allows IT architects and administrators to shift their focus away from routine tasks and towards innovation.

 

Innovation is essential in achieving the ultimate goal of the agile data infrastructure - facilitating the maximum usefulness and highest value of enterprise data.  In examining the data life cycle; creation, retention, and preservation are equally important, and therefore should be weighted equally when constructing a data infrastructure.  In that light, I have been working (along with others) with researchers at the University of California, San Diego in helping define enterprise data growth and a taxonomy that reveals the importance of measuring all aspects of enterprise data.

 

The goal of this continuing research project is to define the overall nature of enterprise data and to assist in understanding how to measure its relative value.   This will eventually help organizations decide where to make investments that will provide the greatest return on their data assets.  For more information on the work being done at UCSD, click here and download their research report "Defining a Taxonomy of Enterprise Data Growth"