Hi @PabloZorzoli
I looked at the code and the 'system' object avg_processor_busy is collected from the system:node object and passed unmodified (except normalization) through to graphite. The 'processor' avg_processor_busy is actually calculated in the cdot-processor plugin based on (sum of per core processor_busy) / (number of cores). Because you said Kahuna domain also gets wonky, and this one is from processor_busy of this same 'processor' object, it implies that the cluster is returning incorrect values to Harvest OR Harvest is processing/summarizing the data incorrectly.
If you can restart the poller with verbose logging enabled (-v option to netapp-worker or netapp-manager) then Harvest will log every response it gets from the cluster and we can investigate which component is to blame. Also, in Harvest v1.3 I added logfile rotation but forgot to document it! You might need to add these key/value pairs to your poller config with sufficiently high values to retain enough logs to capture the issue:
PARAMETER | DESCRIPTION | DEFAULT VALUE |
logfile_rotate_mb | Size in MB per logfile before it is rotated | 5 |
logfile_rotate_keep | Inactive log is archived to log.1, log.2 etc. Set number of archived logfiles to keep | 4 |
Cheers,
Chris Madden
Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!