We hope that you enjoyed the storage stories and that they inspired you with new ways to solve your business problems using technologies like Clustered Data ONTAP, vol move, AFF, and QOS. We certainly had fun writing them and are having even more fun turning them into video. More to come soon. . .
For today’s discussion, I thought it might be good to talk about new approaches to performance troubleshooting for Clustered ONTAP environments. Within the past year OnCommand Performance Manager (OPM) 1.0 was released and OPM 1.1 arrives this fall. If you are interested in a beta release of OPM 1.1, please talk to your NetApp SE as it is available now. So what is OPM?
OnCommand Performance Manager or OPM is a tool that provides deep data storage performance troubleshooting capabilities for Clustered ONTAP. It can help in isolating potential problems and offering concrete solutions to performance issues based upon its system analysis. Whereas OnCommand Insight (OCI) goes wide in monitoring performance across virtualization providers, switches, and most enterprise storage arrays; OPM goes deep into the veins of Clustered ONTAP storage system performance. For the first time in ONTAP performance monitoring history, OPM takes you the customer where only NetApp Support has gone before.
So let’s look at how to get started. The steps are simple. Plug it in, turn it on and let it start monitoring your environment. OPM installs in minutes and begins collecting detailed performance stats for every volume in your cluster and assembling a trend view of what your performance is and should be. All analysis within OPM is derived from the movement of latency. The assumption being that if latency stays within range, your users will be happy that things are “fast enough”. If latency moves out of the accepted range by more than the trended value, than OPM registers an incident. Incidents let you know that performance may not be where it should be.
When an incident occurs, look at the incident details page. Each incident is based off the latency of a volume. For example, let’s say “volume vol_exchange037 is slow” for some reason. While the problem description is pretty simple, the analytics run deep. Historical performance data is trended, analyzed, and compared to determine why.
This is a three phase analysis:
With this analysis complete, you have a great starting point to begin troubleshooting where the problem is occurring on the SAN or storage controller. Stay tuned for Part 2. It’s going to get scary when we will talk about Sharks, Bullies, and Victims.