The grafana dashboard for the Node, uses the processor one for the graph in the System Utilization panel.
From time to time (in my environment) it will get unexpected values, completely outside the 100% as in the screenshot below (same happens with Kahuna):
And also noticed, that if I rely in the avg_processor_busy under system, these annomalies don't seem to be there. So, I'm curious if this is a real issue in my environment or if the ontap counters are playing some games with Harvest.
I hope @madden will came to my rescue on this one.
I looked at the code and the 'system' object avg_processor_busy is collected from the system:node object and passed unmodified (except normalization) through to graphite. The 'processor' avg_processor_busy is actually calculated in the cdot-processor plugin based on (sum of per core processor_busy) / (number of cores). Because you said Kahuna domain also gets wonky, and this one is from processor_busy of this same 'processor' object, it implies that the cluster is returning incorrect values to Harvest OR Harvest is processing/summarizing the data incorrectly.
If you can restart the poller with verbose logging enabled (-v option to netapp-worker or netapp-manager) then Harvest will log every response it gets from the cluster and we can investigate which component is to blame. Also, in Harvest v1.3 I added logfile rotation but forgot to document it! You might need to add these key/value pairs to your poller config with sufficiently high values to retain enough logs to capture the issue:
Size in MB per logfile before it is rotated
Inactive log is archived to log.1, log.2 etc. Set number of archived logfiles to keep
Cheers, Chris Madden
Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)