Solved: odd NFS metrics in netapp-harvest

fede_melaccio · ‎2016-12-14

Hi,

Starting about a week ago, a volume exported in NFS on a FAS8060, cDOT 8.3.2P8, started showing mismatching and maybe wrong NFS throughput metrics in Harvest/Grafana. More specifically:

- total throughput is about 3GB/s, mostly writes.

- NFS overall throughput is again 3GB/s

- NFSv4 and v4.1 are disabled (and no data are shown in Harvest/Grafana accordingly)

- NFSv3 throughput is much lower, about 200MB/s

- I/O BlockSize as reported in Harvest changed from 2 to 40 MB around the same time

Indepedent network metrics are consistent with the 200MB/s traffic, so the 3GB/s throughput value appears to be wrong, or at least not due to real network traffic. Also, note that the volume was moved off another node in the same cluster shortly before the onset of this problem, although I cannot be sure about the exact timing. Could it be a problem with Harvest? Is there another possible reason for this behaviour? Thanks.

Regards

Federico

madden · ‎2016-12-14

Hi @fede_melaccio

I have have seen other reports of much higher throughput of the 'volume' counters and after researching (netapp-worker -v and comparing raw counter data) discovered that the underlying ONTAP counters were incorrect; garbage in, garbage out they say.

An easy way to tell if the raw counters are buggy is to check statistics from the CLI (lun_2 is my volume name):

sdt-cdot1::> statistics show-periodic -object volume -instance lun_2 -counter write_data|instance_name
sdt-cdot1: volume.lun_2: 9/6/2016 04:11:34
instance    write
     name     data
-------- --------
    lun_2   55.2MB
    lun_2   54.3MB

If you see that the write_data value is crazy high then open a support case and suggest they look at:

bug 1048529 - "write_data value in volume stats is unreliable"

By the way, the QoS based counters should be accurate still, so if you check on the Volume dashboard QoS rows you could also see if you have a big difference between those counter values and those from the wafl/volume row on the same dashboard.

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

View solution in original post

madden · ‎2016-12-14

Hi @fede_melaccio

I have have seen other reports of much higher throughput of the 'volume' counters and after researching (netapp-worker -v and comparing raw counter data) discovered that the underlying ONTAP counters were incorrect; garbage in, garbage out they say.

An easy way to tell if the raw counters are buggy is to check statistics from the CLI (lun_2 is my volume name):

sdt-cdot1::> statistics show-periodic -object volume -instance lun_2 -counter write_data|instance_name
sdt-cdot1: volume.lun_2: 9/6/2016 04:11:34
instance    write
     name     data
-------- --------
    lun_2   55.2MB
    lun_2   54.3MB

If you see that the write_data value is crazy high then open a support case and suggest they look at:

bug 1048529 - "write_data value in volume stats is unreliable"

By the way, the QoS based counters should be accurate still, so if you check on the Volume dashboard QoS rows you could also see if you have a big difference between those counter values and those from the wafl/volume row on the same dashboard.

Cheers,
Chris Madden

Solution Architect - 3rd Platform - Systems Engineering NetApp EMEA (and author of Harvest)

Blog: It all begins with data

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

fede_melaccio · ‎2016-12-15

Hi @madden,

Thank you very much for your insightful reply. Indeed, our raw counters are matching the unrealistic numbers shown by Harvest. I have opened a case with the support suggesting to look at that bug. We can declare Harvest "not guilty" 🙂 Thanks for your help!

Cheers

Federico

madden · ‎2016-12-15

odd NFS metrics in netapp-harvest

New video on NetApp KB TV