Subscribe
Accepted Solution

OPM 2.0RC1 broken counters in Graphite export

We just upgraded to OPM 2.0RC1 and in an audit I found 2 counters that are no longer exported to Graphite. I use the Graphite export extensively to find correlations between performance events.

  1. No longer getting volume-level other_latency! Other counters on the volume all work.
  2. The release notes say I get node-level latency and ops, but my other_ops counter is not reading anything. I am getting data on the other counters it says were added. Not sure what is going on, but I can work around it by subtracting read_ops and write_ops from total_ops.
  3. I also lost the node-level system_ops, but I suppose it has been replaced by total_ops

I am running 8.2.3P2 Cluster-Mode

 

Is Netapp aware of these?

Re: OPM 2.0RC1 broken counters in Graphite export

 

Hello Adroita:

 

We are investigating this issue. 

 

Thanks for bringing this to our attention.

 

-Avijit Sikder

Product Manager

Re: OPM 2.0RC1 broken counters in Graphite export

Looks like it's a regression in OPM 2.0 RC1. I've opened a burt for this.

Re: OPM 2.0RC1 broken counters in Graphite export

Adroita,

 

Regarding the 3 issues, here is the latest update.

  1. No longer getting volume-level other_latency! Other counters on the volume all work.
    >> We are trying to reproduce in the house. Currently, we haven't seen it with fresh install. Now trying with upgrade. Another SE reported the same issue with upgrade. A burt is opened for this.

  2. The release notes say I get node-level latency and ops, but my other_ops counter is not reading anything. I am getting data on the other counters it says were added. Not sure what is going on, but I can work around it by subtracting read_ops and write_ops from total_ops.
    >>node total_ops (from ZAPI system_ops) was removed from 8.2.3 ONTAP. We will make sure our 2.0 GA user documentation to communicate this. This counter should work for 8.3 and up, only missing in 8.2.x family with 8.2.3 and up.

  3. I also lost the node-level system_ops, but I suppose it has been replaced by total_ops
    >> you are right on this.