We have a data warehouse that occasional gets out of control and wreaks havoc on our SATA disks, which in turn causes problems with some of our other volumes on the same aggregate. We are working on moving that data somewhere else but in the meanwhile, we want to put some monitoring in place. Is it possible in ops manager to get alerted whenever any volumes latency gets over a certain amount? This seems to be the best indicator as to whether or not we are seeing a problem.
I looked through the list of alarms in ops manager and none of them mention latency but it seems like this should be something that is possible.
Yes, but you need to use the correct tool. Part of the Operations Manager/OnCommand product is a tool called Performance Advisor. It is installed automatically when you install Operations Manager 4.x / OnCommand 5.x. Performance Advisor (PA) continuously monitors performance (every 60 seconds by default) and keeps the data historically. You can define performance thresholds and alarms to notify you when these particular volumes exceed a certain latency, IOPS, or throughput. You can also create thresholds and alarms on the parent aggregate for things like total aggregate IOPS and total throughput. Operations Manager is used more for monitoring capacity, growth and utilization, so it would be the wrong tool for monitoring and alerting on performance.
You just need to use the NetApp Management Console to access PA. You can download the management console from within Operations Manager by going to Setup -> Download Management Console.
Refer to the Operations Manager documentation for instructions on how to use Performance Advisor.
Once Performance Advisor (PA) has collected this data, you can use the NetApp Manageability Software Development Kit (NMSDK) to extract the data from Performance Advisor. You basically write code (Perl, Java, C, .NET) that communicates with Performance Advisor over an API and it can return the data you want. You cannot extract this data from PA using SNMP.
1) Use the Get-NaPerf* cmdlets. If you want to see all of them, type Get-Command Get-NaPerf*
2) Use Invoke-Nassh "stats show ..." Then parse the results.
Neither method is simple unfortunately.
With 1, you have to run teh Get-NaPerfData -Name lun -Counters "avg_latency" command and then the result ends up being a sum value. So in most cases you run it one time, wait one minute, then run it again and find the delta. With latencies, there is more math you have to do. Not very simple at all. This is actually almost 100% identical to how WIndows Perfmon works, and I suspec the APIs are similar or the same.
I actually found 2 to be easier. Parse the text from the stats show lun:*:avg_latency (Output is delimited by a colon) to get your latency values.
This is incidentally how I collect performance data I want to see more granularly than what OnCommand can show and it works really well.