Community

"A Little Bit of Flash Goes a Long Way...": Bye-Bye Automated Storage Tiering

There are many ways to make your IT infrastructure more efficient and effective. Consider storage or data tiering as a way to drive efficiency in your data center. Although that seems to be a reasonable technique for IT departments to implement, why has establishing storage tiering been such a struggle?

 

As a concept introduced by IBM in the early 1980s, storage tiering was known as hierarchical storage management (HSM). In the 1990s, it was called information lifecycle management (ILM). Now, in its latest incarnation, it is referred to as automated storage tiering (AST), with the emphasis on automated, a concept introduced by Compellent in 2005.


 

Figure 1.

 

Although the name has evolved, the fundamental principle is the same: Storage tiering is based on the outdated and operationally inefficient methodology of moving data from high-performing, very expensive media to high-capacity, less-expensive media, with the goal of maximizing the utilization of the storage infrastructure. But whatever its current name, storage tiering is a project that IT organizations never seem to get done.

 

The problem is that most storage tiering solutions available today are overly complex, cumbersome to implement, and still based on an antiquated methodology. See figure 1.

 

The basic premise of storage tiering is about storing data in the right place, and at the right time and price, to support the enterprise. It’s about the efficient utilization of the storage infrastructure.

 

Underpinning the premise of storage tiering are three basic assumptions:

 

 

 


The value of data decreases over time; according to some estimates, data not accessed within 90 days will almost never be accessed again.

 

• It is estimated that less than 20% of all data needs to be on high-performance (and therefore expensive) media.

 

• It’s a reasonable strategy to move as much data as possible to high-capacity drives (SATA) as soon as possible as a way to better utilize the storage infrastructure in the data center.


So how do today’s automated tiering solutions attempt to address these three points?  Let’s start by focusing on the word "automated"; it sounds right, and it’s very appealing—but does it work? It’s easy to equate “automated tiering” with “intelligent tiering”; but before IT departments can “automate” anything, a tremendous amount of work needs to be done. The storage architect or administrator has to collect, analyze, and design the correct workflows so that the system can automate it—in other words, the storage architect has to do the heavy lifting.

 

For instance, consider the following questions that need to be answered in order to architect an appropriate solution:


1. How many tiers of storage does my environment need? Vendors like Dell Compellent offer nine tiers of storage based on type of drive, rotational speed, and RAID levels.


2. How big should tier 1 be? How big should tiers 2, 3, 4 be?


3. How do I determine what data is hot, warm, or cold? Some vendors' implementation of automated storage tiering (for example, EMC), requires a good understanding of application workloads, additional software, and detailed planning and sizing of the different tiers of storage.


4. What kind of data should go in tier 1? How do I classify my data?


5. How long does it take auto-tiering software to migrate data to another tier? In some instances it takes 3 days to promote data and 12 days to demote it—really?


6. When is critical data promoted to tier 1? Keep in mind that data migrations or relocations can affect system performance; depending on the vendor, it could take hours to days.


7. When is cold data moved to tiers 2 and 3?


8. Is the data migration process manual, automatic, or scheduled?


9. How granular is the data migration? Do I need to move a whole LUN? a sub-LUN?


10. How do I know if I have the right data migration policies, thresholds, or time windows for data movement? Ongoing monitoring or calibration will be required.


11. Can I use data efficiency features like deduplication and thin provisioning in my tier 1 storage layer?


12. Do I need different tiering solutions for NAS and for SAN?


13. How many new tools and management end points will these add to my environment? 


14. And perhaps the most important question of all: How much will the new solutions cost? How many licenses will I have to purchase?


Architecting the right solution depends on making sure that these questions get answered correctly and that you’ve collected and analyzed the correct data. In the end, even if your data and analysis are correct, the actual implementation of the solution may be too complex. For example, storage tiering solutions from vendors like EMC’s Fully Automated Storage Tiering (FAST) is a collection of things, not a specific capability. FAST on Symmetrix is different from FAST on CLARiiON, on Celerra. FAST is an umbrella term for a group of point technologies that act differently on every platform. In other words, the success or failure of implementing AST requires painstaking work up front and constant feeding and care in the operations phase. It assumes predictable workloads, and there is little room for flexibility.

 

At NetApp, we’ve decided to take an approach that’s different from the traditional storage tiering approach. As NetApp CEO Tom Georgens said in a phone call with analysts, “The entire concept of tiering is dying. The simple fact of the matter is, tiering is a way to manage migration of data between the Fibre Channel-based system and SATA-based systems. With the advent of Flash, basically these systems are going to go to large amounts of Flash, and that will be dynamic with SATA behind them.... The whole concept of tiered storage is going to go away.” To validate and reinforce the view that Flash is playing a central role in simplifying and changing the storage tiering paradigm, Jeremy Burton, EVP, Product Ops and Marketing, EMC, said recently (Oracle Open World keynote 2012) that “A little bit of Flash goes a long way,” and that according to EMC’s findings, if customers were to deploy about 1% of their storage capacity in Flash, that would serve over 50% of the IOPS.

 

We couldn’t agree more: Flash is making the concept of storage tiering irrelevant.

 

    So what is the NetApp approach?


NetApp® Virtual Storage Tier (VST) is a self-managing, data-driven service layer for storage infrastructure. VST provides real-time assessment of workload priorities and optimizes I/O requests for cost and performance without the need for complex data classification and movement. VST is a simple and elegant approach to a perennial problem that IT organizations would like to check off their to-do lists.


VST promotes hot data without the data movement or migration overhead associated with other approaches to automated storage tiering. Whenever a read request is received for a block on a volume or LUN where VST is enabled, that block is automatically subject to promotion. (4KB blocks are very granular, compared to other implementations.) Note that promotion of a data block to the Virtual Storage Tier is not data migration, because the data block remains on hard disk media when a copy is made to the VST.


VST leverages NetApp’s key storage efficiency technologies (deduplication, volume cloning, thin provisioning), intelligent caching, and simplified management. You simply choose the default media tier you want for a volume or LUN (SATA, FC, or SAS). Hot data from the volume or LUN is automatically promoted on demand (application driven) to Flash-based media.

 

 

 

In summary, the NetApp solution offers:


• Fewer tiers of storage (FC or SAS plus Flash or SSD or a combination)
• Intelligent placement of data (no data migration, no disk I/O consumed)
• Acceleration of the adoption of SATA HDDs (incorporate SATA media earlier in the data lifecycle)
• Application driven (the application and Data ONTAP® drive the promotion of hot data)
• No rules, templates, profiles, or complicated workflows


As you can see, achieving an efficient and effective storage infrastructure doesn’t have to be complicated, elusive, or out of reach. Automated storage tiering is a dying concept because it just doesn’t work. With the advent of SSDs and Flash technology, there is a new, better, and quite exciting way to virtually tier your data and storage. Goodbye AST, and yes FAST too!

Comments
on ‎2012-12-03 06:02 AM

Seems like an odd statement to me 'Flash is making tiering irrelevant', followed by a description of how Netapp do tiering. Just because you copy rather than move and use Flash as your top tier surely doesn't change the basic premise behind tiering - managing the balance between cost and performance. Tiering will only go away if everyone buys all-flash arrays and only all-flash arrays (and even then you'll probably have tiers within flash based on write endurance!) and we all know that's not going to happen.

I'm all for knowing how Netapp approach this tradeoff but a bit more information and less FUD would be good.

NetApp Employee on ‎2012-12-03 12:57 PM

Hi Ed,

Thanks for your note and for adding to the conversation. Regarding your comment about the "odd" statement, what I'm saying is that NetApp's way of making the storage infrastructure more efficient (sure I used "tiering" for a lack of a better word) is based on the smart use of Flash and you are correct we prefer to "cache" data not "move" data. Also, NetApp's approach is to accelerate the adoption of SATA, we are not proposing an all-flash array (perhaps for extreme workloads) strategy -that wouldn't make sense financially. In the end, a key takeaway is that there are meaningful differences on how "tiering" is implemented by different vendors.

Regards,

Cesar

on ‎2012-12-04 02:04 AM

I cannot find any FUD in this blog, just facts.

on ‎2012-12-04 07:40 AM

About that bullet point "Intelligent placement of data (no data migration, no disk I/O consumed)": A 4KB block is read from disk which makes it subject to promotion, resulting in it being copied from SATA\FC disk perhaps to the VST (containing SSD). Ok sure, data is not migrated by definition but considering the latter part of the bullet point, how is this done without any disk I/O consumed to "copy" that 4KB block of data up to VST and onto SSD? In addition, where NetApp's VST technology has a chance to further prove the point, what is the behavior of the formlerly promoted blocks once they are no longer in high demand in VST in comparision with other vendor technologies. Considering FAST, that data would be "migrated" back down to slower disk consuming IO as you mention. Can you elaborate on this part please? Thank you.

NetApp Employee on ‎2012-12-04 10:01 AM

Hi Nick,

Thanks for your questions. Regarding the first question about "no data migration, no disk I/O consumed", let me answer it this way: VST reads and copies the data blocks from disk to Flash/SSD -that's a disk I/O, there is no incremental I/Os (overhead) imposed by using VST. Compare that technique with AST's way of promoting cold blocks: first it needs to copy/migrate the data from SATA to SAS/FC drives. Once on tier 1 the data blocks are served to the application. In other words AST adds an incremental layer of "copying" data when promoting cold blocks.

Regarding your second question, VST demotes data when it's cold or no longer needed -it just evicts it from cache: no copying, no data migration. Again, if you compare how AST works, it still needs to move it back to tier 2, then tier 3 adding more I/Os and overhead in the process. Hope this helps answer your questions.

Regards,

Cesar

on ‎2012-12-29 05:50 AM

Cool cesaro,

Can't wait to see what EMC will say on its FAST VPSmiley Embarassed)

Thanks & Happy New Year

Henry

NetApp Employee on ‎2012-12-30 11:23 AM

Henry,

Thanks for the note, really appreciate it. One key point I'm trying to make is to highlight the differences between NetApp's approach to driving efficiency and effectivess in data centers compared to other methodologies -it's about operational simplicity and creating real value.

Happy New Year and best wishes in 2013.

Cesar