Solved: Harvest Poller stops running after extended period of time with connection issues

acjackson · ‎2017-06-20

Hi,

I have an issue with two harvest pollers, that stop working after occasional connection/network issues , that last longer than ~4 hours

There are log entries like

[WARNING] [nfsv3] update of data cache failed with reason: Server returned HTTP Error: 408 Request

and

[WARNING] [path] update of data cache failed with reason: in Zapi::invoke, cannot connect to socket

Shouldn't the pollers run forever? Or is there any timeout? I only get log entries for about 4 hours after the first error, then there are no more entries, until I restart the poller.

Then the pollers work until the next extended period of time with connection issues.

madden · ‎2017-06-21

Hi @acjackson

The design is for Harvest to try forever. But, there are some other modules it uses (SSL, NetApp SDK to name a few) that may consider some situations fatal. If I knew the place it's failing I could potentially wrap this to prevent it but I'm inclined to look for a solution outside of Harvest.

One solution could be to use supervisord to [re]start each of your harvest pollers with a config variable of autorestart=1 (here and here).

Another solution that is simpler if you are OK with missing soa few minutes of data is to just add a cron entry to run "/opt/netapp-harvest/netapp-manager -start" every 10 minutes. This script just parses the netapp-harvest.conf file, runs ps, and then starts pollers that are not already running.

I know these are workarounds but I think they are the best options for you. Hope it helps!

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

View solution in original post

madden · ‎2017-06-21