The transition to NetApp MS Azure AD B2C is complete. If you missed the pre-registration, you will be invited to register at next log in.Please note that access to your NetApp data may take up to 1 hour.To learn more, read the FAQ and watch the video.Need assistance? Complete this form and select “Registration Issue” as the Feedback Category.
According to the Counter Manager system documentation counters must be monotomically increasing, or in other words it must only increase. It's kind of like the odometer in a car; you check the value, wait a bit, check it again, and calculate the rate of change from the time passed and the change in the odometer. If the odometer goes backwards, well, that doesn't happen unless you are up to no good.
Anyway, back to ONTAP, if you ever check and the rate of change is negative you are then to assume a reset occurred, likely from a rollover of the counter (i.e. it reached the max size of the data type) or a reset (like a system reboot). In this case you drop the negative sample and on the next one you can compute your change again.
When I see the massive numbers like in your screenshot it appears if the values went down temporarily, so something like this:
Time: T1 T2 T3 T4
NFS OPS: 122400, 123400, 100, 123600
Calc'd: N/A , 1000, -123300 (discard), 123500
I've seen it before sporadically at customer sites but haven't had enough to open a bug. If you run Harvest with the -v flag it will record all the raw data received and we can verify this behavior. Next to figure out is what system event caused it. Did anything happen at those timestamps? SnapMirror updates maybe? Cloning?
OPM uses archive files from the system which is a different collection method. It also uses presets which are less granular. Since this is a timing issue I could imagine that those differences somehow avoid the problem.
Cheers, Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
I haven't seen this strange behavior from the CLI statistics command before and they should be accurate regardless of cluster load. Could it be that you have nested QoS policies defined? So maybe a policy applied at SVM level and then also volume or lun/file? Such a config is not supported and might cause oddness like this.
For data not showing up in Grafana, if it is very low IO it could be the latency_io_reqd feature is kicking in. See here for more on it: