The Importance of IO Density in Delivering Storage as a Service (Part 1)

Storage as a Service

 

by Eduardo Rivera, Senior Storage Architect, NetApp IT

 

This is the first blog in a series on NetApp IT’s adoption of storage service levels. To read part 2 of the series, visit The Role of QoS in Delivering Storage as a Service (Part 2).

 

Can NetApp IT deliver storage as a service?

 

NetApp IT posed this question to itself more than a year ago. Our goal was to find a new way to offer our business customers a method by which they could consume storage that not only met their capacity requirements, but also their performance requirements. At the same time, we wanted this storage consumption model to be presented as a predictive and easily consumable service. After consulting with enterprise architects for NetApp’s cloud provider services, we developed a storage service catalog leveraging two main items: IO Density and NetApp clustered Data ONTAP®’s QoS (quality of service).

 

In this first part of this two-part blog, we will discuss how NetApp OnCommand Insight’s IO Density metric played a key role in the design of our storage service catalog.

 

The Role of IO Density

IO Density is a simple, yet powerful idea. The concept itself is not new, but it is essential to building a sound storage consumption model. By definition, IO Density is the measurement of IO generated over a given amount of stored capacity and expressed as IOPS/TB. In other words, IO Density measures how much performance can be delivered by a given amount of storage capacity. Here’s an example of how IO Density works.

 

Suppose we have a single 7.2K RPM drive. By rule of thumb, a single drive of this type can deliver around 50 IOPS @ 20ms response time. Consider, however, that 7.2K RPM drives today can range anywhere from 1TB to 8TB in size. The ability of the drive to deliver 50 IOPS does not change with its size. Therefore, as the size of the drive increases, the IOPS/TB ratio worsens (i.e. you get 50 IOPS/TB with 1TB drive and ​​6.25 IOPS/TB with an 8TB drive).

 

​Applying the same logic, we can divide the amount of capacity​ that we provision to a given application by the amount of IO that an application demands from its storage. The difference is that at the array level, there are many other technologies and variables at play that can determine the IO throughput for a given storage volume. Elements like disk type, controller type, amount of cache, etc., affect how many IOPS a storage array can deliver. Nonetheless, the general capabilities of a known storage array configuration can be estimated with a good degree of accuracy given a set of reasonable assumptions.

 

Using OnCommand Insight we were able to gather, analyze, and visualize the IO Density of all the applications that run on our storage infrastructure. Initially, what we found was surprising. Some applications that anecdotally were marked as high performance were demonstrating very low IO Density rates, and thus were essentially wasting high-performance storage capacity. We also saw the reverse, where applications were pounding the heck out of lower performance arrays because their actual IO requirements were incorrectly estimated at the time of deployment. Therefore, we started to use NetApp OnCommand Insight’s aggregated IO Density report to profile application performance across the entire infrastructure and establish a fact-based architecture.

 

storage service levels.pngUltimately, OnCommand Insight’s IO Density report helped us to identify the range of service levels (defined as IOPS/TB) that the apps actually needed. With this information, we created a storage catalog based on three standard service levels:

  1. Value: Services workloads requiring between 0 and 512 IOPS/TB.
  2. Performance: Services workloads requiring between 512 and 2048 IOPS/TB.
  3. Extreme: Services workloads requiring between 2048 and 8192 IOPS/TB.

Based on our own understanding of our application requirements (as depicted by our IO Density reports), the above three tiers would address 99 percent of our installed base. Those workloads requiring something other than these pre-defined workloads are easily dealt with on a case-by-case basis since there are so few of them.

 

A New Perspective on of Application Performance

IO Density gave us a new perspective on how to profile and deploy our applications across our storage infrastructure. By recognizing that performance and storage capacity go hand in hand, we were able to create a storage catalog with tiers that reflected the actual requirements of our installed base.

 

Our next step was placing IO limits on volumes to prevent applications from stepping on the performance resources of other applications within the same storage array. Stay tuned for part two of this blog where I will discuss how we used clustered Data ONTAP’s adaptive QoS feature to address this issue.

 

For more information on this topic, check out the latest edition of the Tech ONTAP Podcast. 

 

 

The NetApp-on-NetApp blog series features advice from subject matter experts from NetApp IT who share their real-world experiences using NetApp’s industry-leading storage solutions to support business goals. Want to learn more about the program? Visit www.NetAppIT.com.

Comments

Hi,

interesting post.

Quick question : Say I have an application, residing on a volume with a capcity of 7 TB.

the application makes 2000 iops in total, so that makes it less than 500 iops/tb and it shall be estimated  to reside on the "value" tier.

say this is an application database with lots of static data and a segment of highly consumed data which needs  low latency because those database queries are affecting front end user experience within the application.

according to the io density perspective, this application will be estimated for "value" tier, which will result in the volume residing say on a low performance sata aggregate and queries will have high latency, ending with bad user experience.

How do i deal with cases like these?

Thanks,

Alon

@aham_team

 

My team has brought up this same scenario. In our case, we are going to use some type of tagging method (IE CMDB, VMWare, Volume, etc etc) to denote a specific workload as needing to be statically placed in a particular tier.

 

Also, from a technical side. You can still put FlashCache into your Value tier to provide that minimal amount of flash needed to suffice that and other workloads of the similar type. Meaning, where you have workloads that are mainly static, but have a small bit of hot activity, a cache would help deliver the needed performance without needing to move it up into the Performance/Extreme tier.

Member

Hi Alon,

 

The question you pose is a fair one and one that I think many people will likely have.  The bottom line is that although the model I describe above is primarily focused on IOPS/TB, I am also considering expected IO response time when building the service levels (and the physical storage infrastructure behind it).  The same elements I mention about the storage system that can affect throughput (controller, cache, disk type, etc.) also affect how fast this IO can be serviced and thus, they directly affect IO latency.  Thus, each service level does indeed have a latency SLO target and this is calculated based on several assumptions and/or observations of the workload that we intend to serve.  In the case of the Value tier that we built internally, the latency target is 4ms even though the back-end disk technology is based on NL-SAS drives.  How can we achieve this?  Well the answer is simple, we leverage the right mix of controller type and FlashCache (this obviously assumes some amount of IO concurrency and a cache friendly workload).  Also keep in mind that QoS itself is keeping things in check so that the system does not get overrun with rouge workloads.

 

All that said, if your workload requires an even lower IO latency or an extremely consistently low IO latency then your workload probably needs to be bumped up a tier or two even if their IO density does not seem to justify it from a pure throughput perspective.  These exceptions do exist and we have seen them (even if in our environment they are rare).

 

 

 

I hope this helps.  Thanks for your comment and question.

 

Regards,

Eduardo

HI Eduardo

 

Sorry to drag an old post up , but we are currently looking into building a very similar storage stack ( AFF's are inbound to add the Extreme tier we are currently missing ) based around IO Density.

 

We have OCI , but im really struggling to get hold of the IO Density report you keep referring to - 

 

`NetApp OnCommand Insight’s aggregated IO Density report `

 

Is this an off the shelf report we can get as it doesnt appear to be included in a new deploy of OCI. Ive maanged to get hold of some reports but im not sure they are giving the info you are referring to on the report you used.

 

Do you have screenshots of what this report looks like or how I should be looking to get hold of it ?

 

Great post and podcast btw

 

Andy

 I think in order to get access to the IO Density reports, you need to go through the Service Design Workshop with NetApp.  Check with your account rep, I know we found the workshop extremely useful.