In the Harvest admin guide 1.2.2 section 2.1 I have this snippet:
Typical bandwidth usage from Harvest to the monitored node is ~ 15Kbps, and from the monitored node
to Harvest 90Kbps. Again, as instance count increases the bandwidth used will as well.
If you have remote nodes with many monitored instances (i.e. many vols, luns, lifs, etc) and significant
network latency (20ms+) it may be beneficial to deploy a Harvest poller host local to those nodes and
send metrics over the WAN to a central Graphite server. In this way the Harvest polls will not be
unnecessarily delayed by network latency. To determine if having a local poller would be beneficial, test
running Harvest from the remote site and compare the poll duration to the poll update frequency (use the
Grafana Harvest dashboard or start netapp-worker with the -v flag). If the poll duration is much less
than the frequency then it is fine to poll from the central site. But if not, placing the poller on a host near
the monitored system is recommended.
Maybe this helps?
The communication between Harvest and the cluster is quite chatty (lots of API request/responses) and the WAN latency adds to each request. For a small cluster, local collection might take 10s locally vs 30s over the WAN. Still, as long as less than 60s no worries! Communication between Harvest and Graphite is one-way and less data so I don't think WAN latency will matter much.
For simplicity I would first try to collect over the WAN. If you see skipped polls then setup a Harvest local to the cluster. In all cases I think a central Graphite server will be fine.
Hope this helps!
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!