Active IQ Unified Manager Discussions

Harvest Poller stops running after extended period of time with connection issues

acjackson
3,809 Views

Hi,

 

I have an issue with two harvest pollers, that stop working after occasional connection/network issues , that last longer than ~4 hours

 

There are log entries like

 

[WARNING] [nfsv3] update of data cache failed with reason: Server returned HTTP Error: 408 Request

and 

[WARNING] [path] update of data cache failed with reason: in Zapi::invoke, cannot connect to socket

 

 Shouldn't the pollers run forever? Or is there any timeout? I only get log entries for about 4 hours after the first error, then there are no more entries, until I restart the poller.

Then the pollers work until the next extended period of time with connection issues.

1 ACCEPTED SOLUTION

madden
3,764 Views

Hi @acjackson

 

The design is for Harvest to try forever.  But, there are some other modules it uses (SSL, NetApp SDK to name a few) that may consider some situations fatal.  If I knew the place it's failing I could potentially wrap this to prevent it but I'm inclined to look for a solution outside of Harvest.

 

One solution could be to use supervisord to [re]start each of your harvest pollers with a config variable of autorestart=1 (here and here).  

 

Another solution that is simpler if you are OK with missing soa few minutes of data is to just add a cron entry to run "/opt/netapp-harvest/netapp-manager -start" every 10 minutes.  This script just parses the netapp-harvest.conf file, runs ps, and then starts pollers that are not already running.

 

I know these are workarounds but I think they are the best options for you.  Hope it helps!

 

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

 

 

View solution in original post

2 REPLIES 2

madden
3,765 Views

Hi @acjackson

 

The design is for Harvest to try forever.  But, there are some other modules it uses (SSL, NetApp SDK to name a few) that may consider some situations fatal.  If I knew the place it's failing I could potentially wrap this to prevent it but I'm inclined to look for a solution outside of Harvest.

 

One solution could be to use supervisord to [re]start each of your harvest pollers with a config variable of autorestart=1 (here and here).  

 

Another solution that is simpler if you are OK with missing soa few minutes of data is to just add a cron entry to run "/opt/netapp-harvest/netapp-manager -start" every 10 minutes.  This script just parses the netapp-harvest.conf file, runs ps, and then starts pollers that are not already running.

 

I know these are workarounds but I think they are the best options for you.  Hope it helps!

 

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

 

 

acjackson
3,727 Views

Thx, I have setup a cron job and it seems to work. Smiley Happy

 

I have a logfile with debugging activated, that I could PM to you, maybe it helps you to find the issue.

 

 

Public