By Dave Glatfelter, Senior Product Manager, NetApp with assistance from Mike McNamara, Sr. Manager, Product Marketing, NetApp
Managing storage for enterprise applications is largely about managing performance—planning for new applications, growing or redeploying existing applications, or just trying to get the full potential out of your existing systems. Time spent on performance monitoring, management, and diagnostics is a major part of every IT department’s job, and storage is always in the thick of it.
However, storage performance is complex. With various applications, operating systems, drivers, networks, storage operating systems, and caching at every layer, managing the performance of your storage takes a great deal of knowledge about the system, and a lot of experience to accurately predict the effect for any specific change. This knowledge is often beyond the capabilities of your IT staff (and sometimes beyond the capabilities of the vendors themselves). Thus, performance diagnostics and tuning is something of an art.
One way of making this less art and more science is to make sure that systems are well instrumented. They must collect all the information needed for making informed decisions about performance and measuring the effect of changes. This is what “analytics” is all about: collecting the details of every aspect of our array’s performance, and presenting them so that they can be acted upon and results can be determined.
Storage systems collect and display basic performance information through their management consoles, APIs, or both. IOPS, block sizes, read/write mix, and latencies are pretty common. But these metrics tell only part of the story, and it’s usually up to administrators to try to tie these array-centric metrics back to their applications. It’s even harder to use these metrics to predict the results from configuration changes or tuning; there’s just not enough information!
For the last three releases, NetApp® SANtricity® OS, the operating system that runs on NetApp E-Series storage arrays, has been adding to the set of performance metrics that are tracked. We’ve always had the basic performance metrics, but we’ve added more “workload metrics” such as working set size, cache utilization, controller utilization, and device-specific utilization. SANtricity OS writes this data into short-duration logs that can then be accessed through API calls and offloaded to management GUIs, support bundles, and performance tools. Taken together, these workload metrics increase our ability to monitor, manage, diagnose, and characterize workloads on our array.
One thing we can do with this new data is to automate performance tuning (so you don’t have to do it). With the recently released SANtricity 11.30, we are now using analytics to perform automated tuning with our SSD cache, and in a future release we will be implementing adaptive write caching and automated write consolidation for flash. Many areas of the system can be automatically tuned to adapt to ever-changing workloads by using analytics data. Analytics can also spot problems and autocorrect in many cases, so good analytics can speed up problem diagnostics and resolution.
Analytics data is also useful to our support people. Workload analytics information can be captured and made available to support staff through the SANtricity Workload Analysis Tool (see figure 1 below). Additionally, some workload summary information is now included in the NetApp ASUP™ support bundles. Once trace information is captured for a specific workload, we can use other tools to “replay” the workload, thus providing a ready-made set of benchmarks covering a wide variety of applications. This capability is a huge boon to our benchmarking and solutions teams, because they can now quickly test different attributes and configuration changes to see the effect immediately.
Ultimately, analytics data will be used for a broader range of automation and tools for both NetApp Support and customers, as well as core performance tuning within the OS. Many of the performance optimizations added to SANtricity come from an understanding of application I/O characteristics.
Lastly, more analytics data will be made available to you through the console and array APIs. This allows us to provide performance analysis “wizards” that help you find the root causes of performance problems and give suggestions on what you can do to solve them. Workload analytics data also helps you focus on the right area; it’s a waste of time to focus on the array when the bottleneck is somewhere upstream.