TL;DR: Harvest gets very high Latency and IOPS values from Ontap 9.4 systems. We prepare a workaround to solve this partially. You can use our tool to analyze your data.
In the past several months many Harvest users have reported that they see abnormally high latency and IOPS values in Grafana. We have been investigating the matter extensively and I want to give you some update on what we found and what solutions we prepare.
Harvest collects latency and IOPS counters from Ontap systems. Usually each counter is associated with a volume object and Harvest calculates weighted averages and totals for SVMs and nodes. However, we found that the counters that we get from Ontap 9.4 systems have avg_latency and total_ops values that do not match with the read, write and other latencies and IOPS.
This means that Grafana might show average latency values that are higher than the read, write and other latencies. Same goes for IOPS: total IOPS might be shown higher than the sum of read, write and other operations. This only happens for 9.4 systems, older and newer releases of Ontap send counter values that match our expectations.
So the reason of the issue is outside Harvest itself. It seems that the Counter Manager in Ontap 9.4 releases adds indirect operations to avg_latency and total_ops, which it does not to the pair counters. We have observed similar disparities in some other counters as well. If you use Harvest with Ontap 9.4 systems, you will probably see a lot of warnings in your logs.
The good news: we are preparing some workaround. We have developed a new plugin that will calculate avg_latency and total_ops values internally, so they will match with the pair counters. We prepare to make this workaround available in a new release of Harvest, 1.4.2, in February.
Meanwhile, if you don't want to wait and want to help us to test the plugin, feel free to contact the NetApp Harvest developers team.
Finally, you can use this tool here to diagnose your whisper data and see if you have inconsistent latencies or IOPS.
The tool is only meant to check if you have accurate latency counters. It does not fix what you see in Grafana. Please stay tuned for Harvest 1.4.2 which will actually correct the counters in Grafana.