Flying Through Clouds 4: Using OnCommand Insight to quickly check your water levels

By Dan Chilton and Bhavik Desai, NetApp Solutions Performance Engineers

 

In the last blog, Bhavik walked us through a performance philosophy for managing a mixed application cloud.  He talked about storage as a set of buckets that we can fill with water up to certain water levels (capacity) and then add new buckets.  Whether you are a part of an enterprise IT department building private clouds or a service provider hosting tenants, we suggest your goal should be maintaining a happy storage system.  If you keep your storage systems happy, you have a platform to keep your clients happy too.  So how can we monitor the water levels to keep the buckets full but not too full? 

 

Mapping OnCommand Insight to ONTAP

 

If you know the history of the OnCommand Insight, it began at a company called Onaro as SANscreen software, and was acquired by NetApp several years ago.  Flash forward to today and NetApp OnCommand Insight still does an excellent job monitoring all major storage vendor arrays, hypervisors, servers, switches, etc.  However if you come from a NetApp storage background, then you may be unfamiliar with the storage terms used in OnCommand Insight (OCI).  The OCI storage terms chosen were generic as SANscreen was originally designed to monitor all major storage vendor arrays. We thought it would be beneficial if we would take some time to map the key terms for you.

Clustered ONTAP

OnCommand Insight

Aggregate

Storage Pool

FlexVol

Internal Volume

LUN

Volume

Disk/Aggregate Busy

Storage Pool Utilization

Average CPU Utilization

Utilization

 

Pay close attention to the last two terms in the chart – storage pool utilization and utilization.  These are the water levels that you should keep an eye on as you manage your storage resources.  The goal is to keep these < 50% utilized.  The third bucket is capacity utilization for the storage pool (aggregate) and we suggest keeping it to a max of 85% full.

 

Keep in mind that these are happy system recommendations.  There are times when these values will be exceeded and the systems will still perform well.  At other times by exceeding these values the latency will climb up the hockey stick and things will get painful.  By paying attention to the water levels and buckets you can setup various pools that can be tied back to application service levels and performance tiers and (watch this video) seamlessly move volumes between them as requirements change, achieving nondisruptive operations.

 

 

 

 

 

 

 

 

Filtering through the Noise

I did some research on what it is like to pilot an airplane.  Picture a cockpit with dozens of flight instruments to monitor while flying the plane. It’s too much to monitor all of them at once so what are the two most important instruments to monitor to keep flying and avoid crashing?  They are the altitude indicator and the air speed indicators. Now that we have defined the metrics for happy systems in clustered Data ONTAP and mapped them to OCI terms, where can we find them?  It’s easy, check out these two screenshots.

 

 

 

 

 

 

 

Command Line Interface (CLI)

 

If you are an old school CLI fan then I suggest you step into the new school and embrace OCI.  In talking with Stetson, our resident NetApp IT storage specialist, he has found that he can monitor, analyze and address about 75 – 80% of user performance issues with OCI.  For the deep performance issues you can rely on NetApp Support and perfstats to bail you out.

 

Tune in next time as we talk about how NetApp IT defines service levels and leverages Clustered ONTAP to meet them.