Environment: 4-node Cluster w/8.1.1 and OCUM 5.1 in cluster mode
I am tring to create useful summarized performance data by extracting perf data with the dfm perf data retrieve command. All the data is on 5-minute polling interval as fixed in 5.1 for cluster mode. I would like to create output that is for each hour. I can do that by adding a -s 3600 option on the command line. However, the -s description describes what the option does saying "the last sample in each of those regionss will be displayed" and that matches what I've been seeing (examples below). I find that (the "last sample in period) as useless. Why would I want to report on just what happened in the last 5 minute period of an hour? (or pick your favorite -s value)?
I might want the max of a value in the hour, or the average or mean of a value in the hour, but I'm struggling to think of any case where "last sample" would be useful. I'm attempting to reduce that amount of data for a week from 2016 individual samples (every 5 minutes over 7 days) to more like 288 values (24x7).
So started testing the -m and -S options. It appears the combination of -m mean and -S step returns what I'd like. However I'd really like to obtain mean and max on the same table extract. It looks like I have to do separate dfm perf data retrieve commands to get first mean, then max, right?
Also, I do not understand this description of -S. What do simple, step and rolling return.
-S For computing on fixed size data, this defines the method to advance
the chunks of data. Valid values are simple, step and rolling. Default
value is simple. Valid only when a statistical-method is specified.
Simple and step methods are available for all statistical methods.
Rolling is valid only for mean statistical method.
My questions: Could someone:
Confirm my thinking that -s 3600 -m mean -S step is returning the mean (or average) value of all samples within the hour (3600 second sample window)
Provide a better explaination of -S options and examples of when I might use each
=== here is an example to illustrate my questions. This is looking at one counter (cpu_busy) over a 3 hour period from 1am to 4am. ====
dfmc-atx$ dfm perf data retrieve -x timeindexed -o atxcluster01-04 -C system:cpu_busy -b "2013-03-04 01:00:00" -e "2013-03-04 04:00:00" # raw data over 3 hours
-m mean tells it to do a statistical mean on the data
-S step tells it to calculate the mean in steps rather than for the entire range of data (i.e. simple)
-s 3600 tells the command what the statistical step-interval is. In this case, it will provide the mean for each 3600 seconds worth of data (3600 = 1 step).
As for the -S options:
"Simple" just applies a statistical metric to the entire range of data and gives you one result. For example, the mean across 3 days worth of data.
"Step" allows you to apply a statistical metric to a range of data in steps or chunks. For example, give me the mean for each hour across 3 days worth of data. You'd get multiple values back from this command.
"Rolling" allows you to apply a rolling average or a running average to a range of data. It is used to analyze a set of data points by creating a series of averages of different subsets of the full data set. It is only valid with the "mean" metric. Here is an example of a rolling average for every 600 seconds. To be honest, I'm not sure when you'd use a rolling average vs. a step.
we use rolling average within PA but I've never used it within CLI. It's been helpful to have rolling average displayed on top of normal data to visualize a mid/long-term trend, but I'm not sure when we may use this in CLI.