2011-04-08 04:16 PM
Im looking for the 'official' word for a customer thats upset with a PS assessment on their filer "busyness".
There are no performance problems..but, the data we've gathered from our Performance Advisor instance there, doesnt match up well with the data from OM (DFM) in the little itty bitty chart it provides.
Im guessing, that the two charts pull slightly different counters, OR...OM just flattens the heck out of the data..even in an export of the data, OM average over time doesnt match PA average..and the PA averages over time are pretty close to the ASUP CM averages (minute captures -vs- hourly averages)
So..whats the grand pubah answer for "Why does OM not match PA data in my charts?"
Thanks for any assistance on this matter...I'll spread the wisdom far and wide. (Once I get it)..
Solved! SEE THE SOLUTION
2011-04-11 09:45 PM
PA CPU graphs will have to be more reliable. OM CPU Usge graph give data based on the CPU usage sampling interval which is always 15 minutes ( hard coded ). There is this option for host "cpuTooBusyThresholdInterval" but it its a misnomer. Now this leads to problems on the accuracy of the graphs. So basically OM graphs at the moment are an approximate "busyness" pattern during an interval of time rather than one that can give reliable units of data. It is very much possible that a peak usage is detected by DFM and still the OM graph doesn't show it.
I had many internal discussion with the team regarding this some time back. This comes from a customer, now this is a candidate for a burt.
I hope this helps.
2011-04-11 10:21 PM
The PA counter match the ASUP Counter Manger becasue they both get the data from Ontap Counter Manager.
Also its for individual CPU.Where as the DFM CPU data is for all CPUs on the
Also the CPU graphs in OM are consolidated depending upon the graph you are looking at.
For each database table, the Operations Manager server saves sample values for periods of the following duration:
2011-04-12 09:28 PM
I have to totally contradict you.The CPU stats is collected in OM every 5 mins by default.
[root@lnx ~]# dfm options list | grep -i cpu | grep -v clien
cpuBusyThresholdInterval 15 minutes
cpuMonInterval 5 minutes
What I have shown is the global option, it can be customized per filer also using dfm options set cli (except for the monitoring interval)
[root@lnx ~]# dfm host set -q
Valid options are
Output stripped for sake of brevity.
cpuTooBusyThreshold Host CPU Too Busy Threshold (%)
cpuBusyThresholdInterval Host CPU Busy Threshold Interval
So OM collects stats every 5 mins, but generating the event for cpuTooBusyThreshold happens only if the value for this options stays for the time interval specified in cpuBusyThresholdInterval.
So for example if my value for cpuTooBusyThreshold=95 and cpuMonInterval=5min(the deafult value), if at a sampling time the cpuTooBusyThreshold crosses the value of 95 and only if it stays for 15 mins (for the next two samples)
as new sample is collected every 5 mins only then the event is generated.Else its not. So the cpuBusyThresholdInterval basically tries to eliminate alerts being generated on spike instead only when it stay for a longer time.
BTW, none of the options are hardcoded. All are customizable using the dfm option set cli. Both at global and host level except for cpuMonInterval which applies only at global level.
Hope this helps. The reason for flattening is due to the consolidation which is explained in my other post.