I also see vol_summary.avg_latency graphs go through the roof since updating to 9.4P2. I see values up to 8 seconds (!) instead of milliseconds. The other latency counters look normal So far it is only on our snapmirror target.
I did the following
still see vol_summary.avg_latency graphs go through the roof since updating to 9.4P2 - any more input
root@:~# apt list --installed|grep netapp
WARNING: apt does not have a stable CLI interface yet. Use with caution in scripts.
netapp-harvest/now 1.4.1 all [installed,local]
Maybe it's something wrong with the counter itself, not Harvest?
One could verify with "statistics show-periodic" what are the actual values reported by ONTAP. If those are incorrect, the fix should be implemented in ONTAP, not Harvest. It might be possible to implement a WA in Harvest for the time being, though.
I don't have any 9.4 around so cannot check myself.
We did this as part of our standard upgrade process, but we're seeing very strange readings for netapp.perf.$Group.$Cluster.node.*.vol_summary.avg_latency after the upgrade. Before the upgrade, we were seeing latencies of ~0 to ~20ms or thereabouts, but after the upgrade we're seeing latencies reported in the minutes. At first I thought this was due to a change in magnitude of some of the counters, but manually telling grafana that the metric was in e.g. µs still shows data that doesn't look right (800ms latency).
Has the meaning of this counter changed on the NetApp side? Is there a way for us to reinterpret this data on the Harvest/Grafana side?
Thanks a lot for the info. This is an issue that we are trying to address right now, but we are still looking for the exact causes.
Can you help us with a bit more details?
- Does this happen all the time or only occasionally?
- Did you look in the logs of Harvest, any warnings? (If so, can you copy-paste them?)
- What ONTAP system do you have? How many nodes in your Cluster?
Thanks in advance!