Ask The Experts

latency discrepancies

kinoblahblah
1,589 Views

Hi guys, 

 

When I utilize the OCUM api via https://<ocum  ip>/rest/clusters/<cluster id>/volumes?dateTimeRange=LAST_1h&sort=latency~dsc&limit=1 i get: 

"metricValue": 25.176199

When checking in OCUM via url https://<ocum ip>/volumes/<volume id>/explorer  the latency chart shows 

0

 

and finally if i log into our netapp appliance and run the command

statistics volume show -volume <volume> -vserver <vserver>

i get the return 

                                                              *Total Read Write Other  
                                          Read  Write Latency
Volume   Vserver    Ops  Ops   Ops   Ops (Bps)  (Bps)    (us)
------ ---------- ------ ---- ----- ----- ----- ------ -------
<volume> <vserver>     13    0     6     6     0 186841   43582

 

as you can see these numbers are each wildly different...

I'm assuming the reason why the api  is different has to deal with the dateTimeRange=LAST_1h but i can't find any documentation on another possible option. 

I'm assuming the value we should take as the source of truth is the netapp appliance itself;

Can anyone let me know of a method to get all three of these sources to show accurate information?
Thanks

2 REPLIES 2

paul_stejskal
1,489 Views

There are two different counter sources, volume and workload_volume. If you use qos statistics volume performance show -volume XXXXX -vserver XXXXX it should match. Volume is only at the WAFL level, but workload_volume (which AIQUM/OCUM use) is at the network layer (nblade) and WAFL layers. That may be your issue.

 

Also, the command could be having other issues (statistics volume show). As a workaround you could try nodeshell stats volume, then stats stop. If the output doesn't look right, please open a support case and we can identify the proper bug.

kinoblahblah
1,444 Views

Thanks for your reply.  I am curious as to if there are different options for the datetimerange option for ocum api?

I'm thinking the reason why the api is reporting a different value is because it is choosing the highest value of the last hour, not the most recent / current value.

Public