The transition to NetApp MS Azure AD B2C is complete. If you missed the pre-registration, you will be invited to reigister at next log in.
Please note that access to your NetApp data may take up to 1 hour.
To learn more, read the FAQ and watch the video.
Need assistance? Complete this form and select “Registration Issue” as the Feedback Category.

Active IQ Unified Manager Discussions

Harvest avg_processor_busy metrics discrepancy

PabloZorzoli

I noticed that Harvest has 2 places in the metrics path were it reports avg_processor_busy:

 

  1. harvest.xx.cluster.node.node-xx.system.avg_processor_busy
  2. harvest.xx.cluster.node.node-xx.processor.avg_processor_busy

 

The grafana dashboard for the Node, uses the processor one for the graph in the System Utilization panel.

 

From time to time (in my environment) it will get unexpected values, completely outside the 100% as in the screenshot below (same happens with Kahuna):

 

Node

 

 

And also noticed, that if I rely in the avg_processor_busy under system, these annomalies don't seem to be there. So, I'm curious if this is a real issue in my environment or if the ontap counters are playing some games with Harvest.

 

I hope @madden will came to my rescue on this one.

 

Pablo

 

 

 

2 REPLIES 2

madden

Hi @PabloZorzoli

 

I looked at the code and the 'system' object avg_processor_busy is collected from the system:node object and passed unmodified (except normalization) through to graphite.  The 'processor' avg_processor_busy is actually calculated in the cdot-processor plugin based on (sum of per core processor_busy) / (number of cores).  Because you said Kahuna domain also gets wonky, and this one is from processor_busy of this same 'processor' object, it implies that the cluster is returning incorrect values to Harvest OR Harvest is processing/summarizing the data incorrectly. 

 

If you can restart the poller with verbose logging enabled (-v option to netapp-worker or netapp-manager) then Harvest will log every response it gets from the cluster and we can investigate which component is to blame.  Also, in Harvest v1.3 I added logfile rotation but forgot to document it!  You might need to add these key/value pairs to your poller config with sufficiently high values to retain enough logs to capture the issue:

 

PARAMETERDESCRIPTIONDEFAULT VALUE
logfile_rotate_mbSize in MB per logfile before it is rotated5
logfile_rotate_keepInactive log is archived to log.1, log.2 etc. Set number of archived logfiles to keep

4

 

 

 

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

PabloZorzoli

Thanks for the reply @madden I have restarted one of the poller's in verbose mode, and will try to fish out a re-occurrence of it.

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public