<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Wrong QoS counters in Active IQ Unified Manager Discussions</title>
    <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120738#M21643</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I set up several QoS policies with a limit to INF.&lt;/P&gt;&lt;P&gt;It worked fine for a while (monitored with Harvest).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yesterday I noticed that &lt;EM&gt;some&lt;/EM&gt; counters are providing nonsense values:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;qos statistics performance show -refresh-display true -rows 20

Policy Group             IOPS      Throughput    Latency
-------------------- -------- --------------- ----------
-total-               3654789    &lt;FONT color="#FF0000"&gt;61343.16MB/s&lt;/FONT&gt;    11.55ms
User-Best-Effort      3315730    56823.18MB/s    11.14ms
&lt;FONT color="#FF0000"&gt;SPLUNK&lt;/FONT&gt;                 141786      520.79MB/s    29.78ms
_System-Work            51393        6.29MB/s   151.00us
T1RESI                  44773     1990.19MB/s   323.00us
D1ECM                   41502     1875.59MB/s     2.53ms
saelkes3565             &lt;FONT color="#FF0000"&gt;36951&lt;/FONT&gt;           0KB/s    19.66ms
W5E                      6972       60.17MB/s     6.94ms
WE5                      3542       10.88MB/s    10.75ms
WQ5                      3169       10.76MB/s    10.86ms
I2I                      3132        9.42MB/s     9.24ms
W4Q                      2124       18.85MB/s     3.35ms
WQ4                      1487        9.15MB/s    15.28ms
T1MODEL                   814        3.11MB/s  1029.00us
_System-Best-Effort       750           0KB/s        0ms
S4D                       361        4.63MB/s   725.00us
T1INSTRA                  250       20.57KB/s   588.00us
MDMT                       18       36.00KB/s     1.73ms
TEO                        13           0KB/s   149.00us
P1ACRMDB                   12       16.00KB/s        0ms
T1WFM                       6       16.00KB/s   777.00us&lt;/PRE&gt;&lt;P&gt;The whole node is doing 61GB/s? With SATA&amp;nbsp;&lt;IMG id="smileyfrustrated" class="emoticon emoticon-smileyfrustrated" src="https://community.netapp.com/i/smilies/16x16_smiley-frustrated.png" alt="Smiley Frustrated" title="Smiley Frustrated" /&gt;&lt;/P&gt;&lt;P&gt;saelkes3565 is limited to 1000iops (100% getattr).&lt;/P&gt;&lt;P&gt;SPLUNK is limited to 1000iops (90% getattr).&lt;/P&gt;&lt;P&gt;In Grafana I see most of the time no values for the nonsense counters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running 8.3.1P1 here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The cluster is heavy loaded, is this the problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Marcus&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Jun 2025 20:11:35 GMT</pubDate>
    <dc:creator>marcusgross</dc:creator>
    <dc:date>2025-06-04T20:11:35Z</dc:date>
    <item>
      <title>Wrong QoS counters</title>
      <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120738#M21643</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I set up several QoS policies with a limit to INF.&lt;/P&gt;&lt;P&gt;It worked fine for a while (monitored with Harvest).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yesterday I noticed that &lt;EM&gt;some&lt;/EM&gt; counters are providing nonsense values:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;qos statistics performance show -refresh-display true -rows 20

Policy Group             IOPS      Throughput    Latency
-------------------- -------- --------------- ----------
-total-               3654789    &lt;FONT color="#FF0000"&gt;61343.16MB/s&lt;/FONT&gt;    11.55ms
User-Best-Effort      3315730    56823.18MB/s    11.14ms
&lt;FONT color="#FF0000"&gt;SPLUNK&lt;/FONT&gt;                 141786      520.79MB/s    29.78ms
_System-Work            51393        6.29MB/s   151.00us
T1RESI                  44773     1990.19MB/s   323.00us
D1ECM                   41502     1875.59MB/s     2.53ms
saelkes3565             &lt;FONT color="#FF0000"&gt;36951&lt;/FONT&gt;           0KB/s    19.66ms
W5E                      6972       60.17MB/s     6.94ms
WE5                      3542       10.88MB/s    10.75ms
WQ5                      3169       10.76MB/s    10.86ms
I2I                      3132        9.42MB/s     9.24ms
W4Q                      2124       18.85MB/s     3.35ms
WQ4                      1487        9.15MB/s    15.28ms
T1MODEL                   814        3.11MB/s  1029.00us
_System-Best-Effort       750           0KB/s        0ms
S4D                       361        4.63MB/s   725.00us
T1INSTRA                  250       20.57KB/s   588.00us
MDMT                       18       36.00KB/s     1.73ms
TEO                        13           0KB/s   149.00us
P1ACRMDB                   12       16.00KB/s        0ms
T1WFM                       6       16.00KB/s   777.00us&lt;/PRE&gt;&lt;P&gt;The whole node is doing 61GB/s? With SATA&amp;nbsp;&lt;IMG id="smileyfrustrated" class="emoticon emoticon-smileyfrustrated" src="https://community.netapp.com/i/smilies/16x16_smiley-frustrated.png" alt="Smiley Frustrated" title="Smiley Frustrated" /&gt;&lt;/P&gt;&lt;P&gt;saelkes3565 is limited to 1000iops (100% getattr).&lt;/P&gt;&lt;P&gt;SPLUNK is limited to 1000iops (90% getattr).&lt;/P&gt;&lt;P&gt;In Grafana I see most of the time no values for the nonsense counters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running 8.3.1P1 here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The cluster is heavy loaded, is this the problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Marcus&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 20:11:35 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120738#M21643</guid>
      <dc:creator>marcusgross</dc:creator>
      <dc:date>2025-06-04T20:11:35Z</dc:date>
    </item>
    <item>
      <title>Re: Wrong QoS counters</title>
      <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120751#M21647</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.netapp.com/t5/user/viewprofilepage/user-id/11404"&gt;@marcusgross﻿&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I haven't seen this strange behavior from the CLI statistics command before and they should be accurate regardless of cluster load. &amp;nbsp;Could it be that you have nested QoS policies defined? &amp;nbsp;So maybe a policy applied at SVM level and then also volume or lun/file? Such a config is not supported and might cause oddness like this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For data not showing up in Grafana, if it is very low IO it could be the latency_io_reqd feature is kicking in. &amp;nbsp;See here for more on it:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/Harvest-Graphite-quot-spotty-quot-data/td-p/118734" target="_self"&gt;http://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/Harvest-Graphite-quot-spotty-quot-data/td-p/118734&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry I don't have a better answer. &amp;nbsp;If the problem persists I recommend to open a support case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;BR /&gt;Chris Madden&lt;/P&gt;&lt;P&gt;Storage Architect, NetApp EMEA (and author of Harvest)&lt;/P&gt;&lt;P&gt;Blog:&amp;nbsp;&lt;A href="http://blog.pkiwi.com/" target="_blank"&gt;It all begins with data&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;If this post resolved your issue, please help others by selecting&amp;nbsp;&lt;STRONG&gt;ACCEPT AS SOLUTION&lt;/STRONG&gt;&amp;nbsp;or adding a&amp;nbsp;&lt;STRONG&gt;KUDO &lt;/STRONG&gt;or both!&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 12:44:38 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120751#M21647</guid>
      <dc:creator>madden</dc:creator>
      <dc:date>2016-06-29T12:44:38Z</dc:date>
    </item>
    <item>
      <title>Re: Wrong QoS counters</title>
      <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120757#M21650</link>
      <description>&lt;P&gt;Hi Chris,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;we don't have nested Qos groups.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I open a ticket towards Netapp.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Marcus&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 13:41:28 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120757#M21650</guid>
      <dc:creator>marcusgross</dc:creator>
      <dc:date>2016-06-29T13:41:28Z</dc:date>
    </item>
    <item>
      <title>Re: Wrong QoS counters</title>
      <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120928#M21686</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sometimes the QoS counters showing the right values, sometimes not. OCPM works well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also noticed that there are some spikes on normal Harvest counters:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;IMG src="https://community.netapp.com/t5/image/serverpage/image-id/5587i094C55568394A2F6/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="spikes.png" title="spikes.png" /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2016 15:19:14 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120928#M21686</guid>
      <dc:creator>marcusgross</dc:creator>
      <dc:date>2016-07-04T15:19:14Z</dc:date>
    </item>
    <item>
      <title>Re: Wrong QoS counters</title>
      <link>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120934#M21688</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.netapp.com/t5/user/viewprofilepage/user-id/11404"&gt;@marcusgross﻿&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;According to the Counter Manager system documentation counters must be monotomically increasing, or in other words it must only increase.&amp;nbsp;&amp;nbsp;It's kind of like the odometer in a car; you check the value, wait a bit, check it again, and calculate the rate of change from the time passed and the change in the odometer. &amp;nbsp;If the odometer goes backwards, well, that doesn't happen unless you are up to no good.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyway, back to ONTAP, if you ever check and the rate of change is negative you are then to assume a reset occurred, likely from&amp;nbsp;a rollover of&amp;nbsp;the counter (i.e. it reached the max size of the data type) or a reset (like a system reboot). &amp;nbsp;In this case you drop the negative sample and on the next one you can compute your change&amp;nbsp;again.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I see the massive&amp;nbsp;numbers like in your screenshot it appears if&amp;nbsp;the values went&amp;nbsp;down temporarily, so something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Time:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; T1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; T2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; T3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;T4&lt;/P&gt;&lt;P&gt;NFS OPS: &amp;nbsp;122400, 123400, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 100, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 123600&lt;/P&gt;&lt;P&gt;Calc'd: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; N/A &amp;nbsp; , &amp;nbsp; 1000, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;-123300 (discard), &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 123500&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've seen it before sporadically at customer sites but haven't had enough to open a bug. &amp;nbsp;If you run Harvest with the -v flag it will record all the raw data received and we can verify this behavior. &amp;nbsp;Next to figure out is what system event caused it. &amp;nbsp;Did anything happen at those timestamps? &amp;nbsp;SnapMirror&amp;nbsp;updates maybe? &amp;nbsp;Cloning?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;OPM uses archive files from the system which is a different collection method. &amp;nbsp;It also uses presets which are less granular. &amp;nbsp;Since this is a timing issue I could imagine that those differences somehow avoid the problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;BR /&gt;Chris Madden&lt;/P&gt;&lt;P&gt;Storage Architect, NetApp EMEA (and author of Harvest)&lt;/P&gt;&lt;P&gt;Blog:&amp;nbsp;&lt;A href="http://blog.pkiwi.com/" target="_blank"&gt;It all begins with data&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;If this post resolved your issue, please help others by selecting&amp;nbsp;&lt;STRONG&gt;ACCEPT AS SOLUTION&lt;/STRONG&gt;&amp;nbsp;or adding a&amp;nbsp;&lt;STRONG&gt;KUDO &lt;/STRONG&gt;or both!&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2016 18:50:57 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Active-IQ-Unified-Manager-Discussions/Wrong-QoS-counters/m-p/120934#M21688</guid>
      <dc:creator>madden</dc:creator>
      <dc:date>2016-07-04T18:50:57Z</dc:date>
    </item>
  </channel>
</rss>

