Harvest avg_processor_busy metrics discrepancy

I noticed that Harvest has 2 places in the metrics path were it reports avg_processor_busy:


  1. harvest.xx.cluster.node.node-xx.system.avg_processor_busy
  2. harvest.xx.cluster.node.node-xx.processor.avg_processor_busy


The grafana dashboard for the Node, uses the processor one for the graph in the System Utilization panel.


From time to time (in my environment) it will get unexpected values, completely outside the 100% as in the screenshot below (same happens with Kahuna):





And also noticed, that if I rely in the avg_processor_busy under system, these annomalies don't seem to be there. So, I'm curious if this is a real issue in my environment or if the ontap counters are playing some games with Harvest.


I hope @madden will came to my rescue on this one.







Re: Harvest avg_processor_busy metrics discrepancy

Hi @PabloZorzoli


I looked at the code and the 'system' object avg_processor_busy is collected from the system:node object and passed unmodified (except normalization) through to graphite.  The 'processor' avg_processor_busy is actually calculated in the cdot-processor plugin based on (sum of per core processor_busy) / (number of cores).  Because you said Kahuna domain also gets wonky, and this one is from processor_busy of this same 'processor' object, it implies that the cluster is returning incorrect values to Harvest OR Harvest is processing/summarizing the data incorrectly. 


If you can restart the poller with verbose logging enabled (-v option to netapp-worker or netapp-manager) then Harvest will log every response it gets from the cluster and we can investigate which component is to blame.  Also, in Harvest v1.3 I added logfile rotation but forgot to document it!  You might need to add these key/value pairs to your poller config with sufficiently high values to retain enough logs to capture the issue:


logfile_rotate_mbSize in MB per logfile before it is rotated5
logfile_rotate_keepInactive log is archived to log.1, log.2 etc. Set number of archived logfiles to keep





Re: Harvest avg_processor_busy metrics discrepancy

Thanks for the reply @madden I have restarted one of the poller's in verbose mode, and will try to fish out a re-occurrence of it.

