Because the CLI is not capable of retrieving historical performance data (if it is, please let me know!), we've developed a script to call the CLI and get raw counter data for different components like so:
statistics show -raw -object volume -counter [list of common metrics like read_ops, write_ops, read_latency, etc.)
The script is scheduled to run commands like this through the CLI and save them every 5 minutes. So when we go back and look at all of these raw records, we can take any two records of a component (i.e. a volume), and take the difference between their raw counters and divide it by the number of seconds between the two records to get the rate of a metric over that time period. Take read ops as an example:
Object: volume
Instance: performance_test
Start-time: 12/17/2020 10:04:24
End-time: 12/17/2020 10:04:24
Scope: svm1
Counter Value
-------------------------------- --------------------------------
read_ops 1752
Object: volume
Instance: performance_test
Start-time: 12/17/2020 10:09:23
End-time: 12/17/2020 10:09:23
Scope: svm1
Counter Value
-------------------------------- --------------------------------
read_ops 4837
This is kind of what the output from the CLI would look like for two records. You can see that the times they were recorded were just a second shy of 5 minutes exactly. So if we ask ourselves: "what was the rate of read operations per second over those five minutes?", we can come up with an answer.
We subtract the old raw counter from the current raw counter, and divide it by the duration (5 minutes, or 300 seconds):
(4837- 1752) / 300 = 3085 / 300 = 10.28333 read operations per second.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Our issue is that while this makes sense to us, and we've been able to use this method to verify that it works with one volume, other volumes tend to have drastically higher rates compared to the OnCommand Unified Manger which we are using to verify our calculations and ensure they are accurate. We take a component, calculate a metric rate for a given time period, and then compare it to OCUM performance data at that given time.
Questions:
1) Why would a comparison for one volume match OCUM performance data graphs, but not other volumes?
2) Is this method accurate/reliable?
3) Are there any mistakes in the calculations we are performing?
4) Is there a way to obtain historical performance data from the CLI?
Thanks,
Mark