What is Policy-Based Enterprise Data Management?

By Larry Freeman, Data Storage Technologist, NetApp

 

I don’t think this will be a big surprise to anyone in IT – but data is growing in monumental proportions.  I was reading an analyst report the other day that said enterprise data is growing at an average of 50% per year.  This seems to be the industry consensus so I thought I’d do some quick math and see just what that meant to data capacities.  Turns out the math showed some startling numbers.  If you started 2010 with just 100TB of online data storage, with 50% yearly growth, you’ll be sitting on 5.8PB of data in 2020!  And if you started the decade with a petabyte, get ready for 58PB in a half dozen or so years.

 

I’ve been saying for a while now (it’s in my book) that it’s not the amount of data that is becoming the problem (we’ll just keep making denser HDD’s and SSD’s); it’s the management of this enterprise data that will drive us crazy.  We are at the point here human-managed data just isn’t practical any more – and to some it’s already just plain impossible.

 

Why do I say this?  First, let’s define some decisions that storage administrators have to make when it comes to enterprise data management:

 

1) Growth - When I provision capacity for this application, will it have slow growth, or fast growth?  How much capacity should I allocate?

2) Efficiency - Do I want to pack all my data as tightly as possible, or am I concerned this might affect performance?

3) Speed - Does this application have the need for speed, or will any old network pipe do?

4) Reliability - Will the world stop spinning (well, at least my world) if I lose access to this data?  Or could I live without it for a few hours (or days)

5) Security - Am I worried about anyone else seeing this data?  Or, like my kid’s Facebook page, the more the merrier?

 

If I use the criteria above, we could build a simple table that looks something like this:

 

Tiered Storage Model

 

Tier 3

Tier 2

Tier 1

Capacity Growth

Good

Better

Best

Efficiency

Speed

Reliability

Security

 

3 Choices

 

We could just define 3 tiers of storage and plop our enterprise data into a storage system in the appropriate tier.  Easy, huh?  Well, yes, and that’s pretty much how storage has been provisioned for the past 20 years or so.  Buy a bunch of different storage arrays, label them as either tier 1, 2, or 3, and assign your applications accordingly.

 

But, unfortunately, simple methods aren’t always the most efficient.  What if, for instance, you have an application with high security needs but low speed and growth requirements?  In the tiering model above, any attribute with a “high” value would override everything else in the matrix, causing inefficiencies in a simple 3-tier model.

 

To solve this problem, let’s look at a more advanced model:

 

Policy-Based Storage Model

 

Don't Care

Nice to Have

Required

Growth Capacity

   

P

Efficiency

   

P

Speed

 

P

 

Reliability

P

   

Security

P

   
 

243 Possible Combinations

 

In this model, each characteristic of the 5 attributes can be specified independently.  Now, instead of 3 rigid tiers, 243 different profiles can be created by moving the check marks back and forth for each attribute.  A little more complicated, but certainly more efficient than the first model.  This model incorporates the ability to turn features on and off.  Want efficiency?  Turn on dedupe and thin provisioning.  Need guaranteed capacity and speed?  Turn off thin provisioning and add cache.  In the model above, enterprise data management could be done manually by humans, but requires a little more thought.  But wait, things are about to get really complicated rather quickly.  To illustrate, let’s see what happens when a few more options are added to each category:

 

Automated Policy-Based Storage Model

 

Don't Care

Nice to Have

Required

Growth Capacity

   

  Lots of Free Capacity

 

P

  System Clustering

P

 

  Performance Scaling

 

P

Efficiency

   

  Deduplication

   

P

  Thin Provisioning

 

P

  Data Compression

P

 

Speed

 

 

  Fast CPU

 

P

  Intelligent Cache

 

P

  Media Tiering

 

P

Reliability

   

  Replication

P

 

  Mirroring

P

   

  D2D Backup

 

P

Security

   

  Secure Multi Tenancy

   

P

  Encryption at Rest

P

 

  Encryption to Tape

P

 
 

14,348,907 Possible Combinations

 

Believe it or not, in this model, there are over 14 million possible combinations based on only 15 selections with 3 choices for each.  This is why policy-based automation in data management technology is quickly becoming a requirement in today’s enterprise data management environments.  With a policy engine in place, a desired data profile can be automatically mapped to the most appropriate storage resource, with the appropriate software functionality automatically enabled, with a monitoring system in place to alert us of any out-of-policy conditions.

 

This is precisely the objective of NetApp’s OnCommand Workflow Automation, or WFA for short.  This data management solution lets you create a true policy-based environment where you decide the characteristics of the data, and the policies decide where the data should go and what features should be turned on or off.

 

What’s left for us humans to do?  Nothing really – other than listen to the machines hum and wait for some sort of alert that pops up if something goes out-of-policy.  Sort of like an airline pilot listening to the jet engines hum as autopilot actually flies the plane.  I’ll talk more about the pilot’s - er, storage architect’s - role in policy-based storage management - and I’ll share some examples of where WFA is in use in an upcoming blog…

 

DrDedupe