Hi @marcusgross
According to the Counter Manager system documentation counters must be monotomically increasing, or in other words it must only increase. It's kind of like the odometer in a car; you check the value, wait a bit, check it again, and calculate the rate of change from the time passed and the change in the odometer. If the odometer goes backwards, well, that doesn't happen unless you are up to no good.
Anyway, back to ONTAP, if you ever check and the rate of change is negative you are then to assume a reset occurred, likely from a rollover of the counter (i.e. it reached the max size of the data type) or a reset (like a system reboot). In this case you drop the negative sample and on the next one you can compute your change again.
When I see the massive numbers like in your screenshot it appears if the values went down temporarily, so something like this:
Time: T1 T2 T3 T4
NFS OPS: 122400, 123400, 100, 123600
Calc'd: N/A , 1000, -123300 (discard), 123500
I've seen it before sporadically at customer sites but haven't had enough to open a bug. If you run Harvest with the -v flag it will record all the raw data received and we can verify this behavior. Next to figure out is what system event caused it. Did anything happen at those timestamps? SnapMirror updates maybe? Cloning?
OPM uses archive files from the system which is a different collection method. It also uses presets which are less granular. Since this is a timing issue I could imagine that those differences somehow avoid the problem.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!