Active IQ Unified Manager Discussions

OM CPU graph -vs- PA CPU graphs..

jmohler
4,415 Views

Im looking for the 'official' word for a customer thats upset with a PS assessment on their filer "busyness".

There are no performance problems..but, the data we've gathered from our Performance Advisor instance there, doesnt match up well with the data from OM (DFM) in the little itty bitty chart it provides.

Im guessing, that the two charts pull slightly different counters, OR...OM just flattens the heck out of the data..even in an export of the data, OM average over time doesnt match PA average..and the PA averages over time are pretty close to the ASUP CM averages (minute captures -vs- hourly averages)

So..whats the grand pubah answer for "Why does OM not match PA data in my charts?"

Thanks for any assistance on this matter...I'll spread the wisdom far and wide.  (Once I get it)..

1 ACCEPTED SOLUTION

sinhaa
4,415 Views

Hi,

PA CPU graphs will have to be more reliable. OM CPU Usge graph give data based on the CPU usage sampling interval which is always 15 minutes ( hard coded ). There is this option for host "cpuTooBusyThresholdInterval" but it its a misnomer. Now this leads to problems on the accuracy of the graphs. So basically OM graphs at the moment are an approximate "busyness" pattern  during an interval of time rather than one that can give reliable units of data. It is very much  possible that a peak usage is detected by DFM and still the OM graph doesn't show it.

I had many internal discussion with the team regarding this some time back. This comes from a customer, now this is a candidate for a burt.

I hope this helps.

warm regards,

Abhishek

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

View solution in original post

5 REPLIES 5

sinhaa
4,416 Views

Hi,

PA CPU graphs will have to be more reliable. OM CPU Usge graph give data based on the CPU usage sampling interval which is always 15 minutes ( hard coded ). There is this option for host "cpuTooBusyThresholdInterval" but it its a misnomer. Now this leads to problems on the accuracy of the graphs. So basically OM graphs at the moment are an approximate "busyness" pattern  during an interval of time rather than one that can give reliable units of data. It is very much  possible that a peak usage is detected by DFM and still the OM graph doesn't show it.

I had many internal discussion with the team regarding this some time back. This comes from a customer, now this is a candidate for a burt.

I hope this helps.

warm regards,

Abhishek

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

adaikkap
4,415 Views

Abishek,    

     I have to totally contradict you.The CPU stats is collected in OM every 5 mins by default.

[root@lnx ~]# dfm options list | grep -i cpu | grep -v clien
cpuBusyThresholdInterval              15 minutes
cpuMonInterval                        5 minutes
cpuTooBusyThreshold                   95
[root@lnx ~]#

What I have shown is the global option, it can be customized per filer also using dfm options set cli (except for the monitoring interval)

[root@lnx ~]# dfm host set -q


Valid options are

Output stripped for sake of brevity.

  cpuTooBusyThreshold       Host CPU Too Busy Threshold (%)
  cpuBusyThresholdInterval  Host CPU Busy Threshold Interval

So OM collects stats every 5 mins, but generating the event for cpuTooBusyThreshold happens only if the value for this options stays for the time interval specified in cpuBusyThresholdInterval.

So for example if my value for cpuTooBusyThreshold=95 and cpuMonInterval=5min(the deafult value), if at a sampling time the cpuTooBusyThreshold crosses the value of 95 and only if it stays for 15 mins (for the next two samples)

as new sample is collected every 5 mins only then the event is generated.Else its not. So the cpuBusyThresholdInterval   basically tries to eliminate alerts being generated on spike instead only when it stay for a longer time.

BTW, none of the options are hardcoded. All are customizable using the dfm option set cli. Both at global and host level except for cpuMonInterval which applies only at global level.

Hope this helps. The reason for flattening is due to the consolidation which is explained in my other post.

Regards

adai

sinhaa
4,416 Views

Thanks adai for the correction.

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

adaikkap
4,415 Views

The PA counter match the ASUP Counter Manger becasue they both get the data from Ontap Counter Manager.

Also its for individual CPU.Where as the DFM CPU data is for all CPUs on the

Also the CPU graphs in OM are consolidated depending upon the graph you are looking at.

For each database table, the Operations Manager server saves sample values for periods of the following duration:

  • Each daily      history sample covers 15 minutes.
  • Each weekly      history sample covers two hours.
  • Each monthly      history sample covers eight hours.
  • Each quarterly      history sample covers one day.
  • Each yearly      history sample covers four days.

Regards

adai

jmohler
4,415 Views

Great replies..the pair.

I could only pick one of them as the "correct" one..which is a shame..

Public