Hi @Michael_Orr
The design intent of Harvest is that it will never die. It may fail to start if some required info is not in the conf file (like the cluster hostname to monitor) but even if it can't connect due to a DNS resolution failure it will keep trying hoping someone fixes dns 🙂
So if it dies then either there is a bug in Harvest (we found one recently that was a divide by zero error in the situation you had a port online at 10Mbit) or some other situation is causing a failure in some other module (NetApp SDK or module it uses) resulting in it dieing. In principle a busy cluster can still be monitored it just might not be able to keep up with all counters every 60s. Maybe though because the cluster is very busy some API call responses are incomplete or truncated. If you can send me the entire poller logfile (via private message is fine) I can potentially figure out what is going on.
Until that time a workaround would be to add a crontab entry to run netapp-manager -start every hour (or minute if you really want to minimize missed data in case it dies). The netapp-manager will basically find any pollers that are not running and start them. If all are already running it does nothing.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!