Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi,
Starting about a week ago, a volume exported in NFS on a FAS8060, cDOT 8.3.2P8, started showing mismatching and maybe wrong NFS throughput metrics in Harvest/Grafana. More specifically:
- total throughput is about 3GB/s, mostly writes.
- NFS overall throughput is again 3GB/s
- NFSv4 and v4.1 are disabled (and no data are shown in Harvest/Grafana accordingly)
- NFSv3 throughput is much lower, about 200MB/s
- I/O BlockSize as reported in Harvest changed from 2 to 40 MB around the same time
Indepedent network metrics are consistent with the 200MB/s traffic, so the 3GB/s throughput value appears to be wrong, or at least not due to real network traffic. Also, note that the volume was moved off another node in the same cluster shortly before the onset of this problem, although I cannot be sure about the exact timing. Could it be a problem with Harvest? Is there another possible reason for this behaviour? Thanks.
Regards
Federico
Solved! See The Solution
I have have seen other reports of much higher throughput of the 'volume' counters and after researching (netapp-worker -v and comparing raw counter data) discovered that the underlying ONTAP counters were incorrect; garbage in, garbage out they say.
An easy way to tell if the raw counters are buggy is to check statistics from the CLI (lun_2 is my volume name):
sdt-cdot1::> statistics show-periodic -object volume -instance lun_2 -counter write_data|instance_name sdt-cdot1: volume.lun_2: 9/6/2016 04:11:34 instance write name data -------- -------- lun_2 55.2MB lun_2 54.3MB
If you see that the write_data value is crazy high then open a support case and suggest they look at:
bug 1048529 - "write_data value in volume stats is unreliable"
By the way, the QoS based counters should be accurate still, so if you check on the Volume dashboard QoS rows you could also see if you have a big difference between those counter values and those from the wafl/volume row on the same dashboard.
Cheers,
Chris Madden
Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
I have have seen other reports of much higher throughput of the 'volume' counters and after researching (netapp-worker -v and comparing raw counter data) discovered that the underlying ONTAP counters were incorrect; garbage in, garbage out they say.
An easy way to tell if the raw counters are buggy is to check statistics from the CLI (lun_2 is my volume name):
sdt-cdot1::> statistics show-periodic -object volume -instance lun_2 -counter write_data|instance_name sdt-cdot1: volume.lun_2: 9/6/2016 04:11:34 instance write name data -------- -------- lun_2 55.2MB lun_2 54.3MB
If you see that the write_data value is crazy high then open a support case and suggest they look at:
bug 1048529 - "write_data value in volume stats is unreliable"
By the way, the QoS based counters should be accurate still, so if you check on the Volume dashboard QoS rows you could also see if you have a big difference between those counter values and those from the wafl/volume row on the same dashboard.
Cheers,
Chris Madden
Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hi @madden,
Thank you very much for your insightful reply. Indeed, our raw counters are matching the unrealistic numbers shown by Harvest. I have opened a case with the support suggesting to look at that bug. We can declare Harvest "not guilty" 🙂 Thanks for your help!
Cheers
Federico