2011-06-02 09:54 AM
Hi, we've just undergone a major upgrade from 3040 clusters with 1 Gb vifs to 3270s with 10Gb networking and new 15K RPM aggrs.
I was reviewing a Performance Tuning doc by Netapp's Tom Hamilton and it seems a ripe opportunity for a netapp tool to take the statit, sysstat outputs and perform a least the initial analysis of thresholds for latency, CPU domain, loop saturation etc checks
Netapp is famous for creating great tools for everything -
Does Netapp have or plan to create such an "expert system" tool for dianosing performance bottlenecks?
My sense is with our upgrade our bottleneck has shifted from network to somewhere else (loop saturation?) -
For example the analysis for diagnosing loop saturation is not clear to me.
thanks for any feedback
2011-06-02 04:29 PM
I'm guessing that if you don't want to go through the work of pulling and graphing a large number of snmp variables (or using the developer toolkit and grabing things via xml) then you are going to be pushed towards the Performance Advisor part of DFM/OM, where you can get a bit of an overview of such things. The good old sysstat output is a good place to start. Sending a perfstat to your local NetApp guy might get you a few hints as well.
Why do you think you have a bottleneck?
2011-06-08 09:22 AM
VMware released an award winning product called vCenter Operations - it takes in all the diagnostic information and uses dynamic heuristics to identify resource constraints in key areas.
This tool helped me identify some important issues I could not have seen unless I was constantly watching and collecting data and analyzing it
I realize I need the same kind of tool for ONTAP
Our fiber loops have a switch for 1,2,4 Gb - they are currently set to 2 Gb, so one question I have is - how close are we getting to loop saturation?
2011-06-08 10:32 AM
I guess if you have deep pockets, you can get yourself DFM/Operation Manager. The Performance Advisor can do at least some minor diagnosis and you can get graphs of physical interfaces as well. There is the possibility of creating custom thresholds here too.
The biggest problem is the brokenness of the java "NetApp Management Console" and the relative unreliability of data collection/display.
You could always collect a good deal of the statistics yourself via snmp and make some simple graphs with mrtg or the like. I don't remember 100%, but I think there are OIDs for each target interface as well.
2011-06-09 03:57 AM
You or your partner could have a look at CMPG (Only available for partners)
This tool creates an Excel sheet full with graphs based on performance data your system generates (hourly stats).
2011-06-20 11:51 AM
From a whizbang tool standpoint, you might be interested in checking out Oncommand Balance, formerly Akorri BalancePoint. It does not rely on an expert system per se but instead uses queueing theory to identify bottlenecks across virtualized infrastructure. I'm not aware that it looks inside NetApp systems to the level of detail of comparing loop utilization.
From the standpoint of solving your current problem, the quickest approach will be to open a case and also engage your local NetApp contacts. They/we should be able to help you employ some of the tools mentioned above.