It's been a week so not sure if you're still looking for some insight but I see part of your issue here.
If you look carefully at your first image, with the raw graphite data, you'll see that while on the left you've highlighted write_data, in the legend for the graph you're showing the metrics:
...aggr.total_transfers and ...aggr.Node02_SSD.total_transfers. Remember that the graphite interface will show each metric you double click, and it removes them when you again double click them. Its easy to end up looking at a bunch of unrelated items this way.
In your grafana dashboard you're looking at Node.xxx.fcp.read_data and Node.xxx.fcp.write_data. These don't measure the same things. Aggregate total transfers does not equal protocol reads+writes for a node. Each aggregate is measuring its own operations to and from disk. The protocols are each [FCP, iSCSI, CIFS, NFS] measuring their operations from and to the client. Depending on what you're wanting to measure you'd look at one or the other or both together but they will not show the same values.
Non of the throughput values seem correct in grafana. Not just that one. I know I am pushing 400~600MB/s on average through my FAS8060, but grafana is showing 0.4~1.2MB/s. So there is something off, I just have no clue where to look.
I just did a comparison to the values below in grafana with Brocade Switch View for the physical ports in my environment. (Because I'm comparing to physical ports, I have to look at the node's physical port values, not the SVM LIF values, or I'd have to do a bunch of math. This is the metric that you show on the left side in the initial screenshot of graphite, but not the metric you were actually displaying.)
I use the Network Port dashboard and reference these metrics in the Fibre Channel row. (Metrics captured by choosing edit on the graph).
In the netapp-harvest.conf file you will find a default key/value like this:
normalized_xfer = mb_per_sec
What it will do is normalize all throughput numbers to MB/s. So in Graphite and Grafana you are viewing in MB/s and not that of the native Data ONTAP counter manager counter being graphed. I found normalizing data to be a much easier way of working; you can always scale back to whatever unit you want if needed for your use case.
Regarding throughput being off, sometimes it is just user confusion because with cDOT the node that does the frontend protocol work is not necessarily the same that does the backend volume work. Depending on the object you're looking at you may see frontend or backend numbers. In the default "node" dashboard you will see "protocol backend drilldown" and then things like "FCP frontend drilldown" to show these both.
So in the "frontend" views you see very detailed information about the IOPs arriving that node. Those IOPs are then translated into WAFL messages and sent to the backend (on the same or different node) to be serviced. At the "backend" the messages are tagged with protocol but otherwise are only tracked as read/write/other vs much more detail tracked at the "frontend" node. If all traffic is direct (IOPs arrive on a LIF on the same node that owns the volume) then the "frontend" and "backend" numbers should agree, but if you have indirect traffic they will be different.
Maybe you can check your setup taking the above info into account and let us know if that helped?
--If it does, please also "accept as answer" the post that answered your question so that others will see the Q/A is answered.
Cheers, Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Great to hear! The default dashboards assume you normalize perf info to mb_per_sec and capacity info (UCUM) to gb_per_sec so it was probably a copy/paste mistake between pollers of different server types. I can see myself doing this too so I'll have to think about improving usability here...