Active IQ Unified Manager Discussions

Can OnCommand/PerfAdv do more complex calculations?

schepers
3,210 Views

I would like to see an event to be generated when the following occurs:

  • volume-responsetime> volume-responsetime-threshold AND volume-throughput (either IOPS or MB/s) / volume-size > "normalized-throughput-threshold"  .

The above requires a mathematical "divide by" operation.

To make it more complex the intention is to create performance tiers high, standard and low where low is everything that is not high or standard . See picture

On the X-axe there is the NT (Normalized Throughput - i.e. throughput divided by the size of the volume). The Y-axe is the average response time of the volume (VRT).

In order to decide if a data set that resides on a filer will fit the low/standard/high tier, the following must be determinated:

Will it be possible to per volume unit volume-latency > volume-latency-Gold-threshold- AND volume-throughput (either IOPS or MB/s) / volume-size > "normalized-throughput-threshold"  .

IF (Normalized Throughput > NT-low)

THEN

IF (Volume-ResponsTime > VRT-low)

THEN

ALERT

ELSE

IF (Normalized Throughput > NT-standard)

THEN

IF (Volume-ResponsTime > VRT-standard)

THEN

ALERT

ELSE

IF (Normalized Throughput > NT-high)

THEN

IF (Volume-ResponsTime > VRT-high)

THEN

ALERT

Fi

FI

FI

FI

FI

How would that needs to be achieved?

Thank you

Yann

3 REPLIES 3

reide
3,210 Views

Wow.  You've clearly got too much free time on your hands.    You're the only person I've ever seen correlate the throughput of a volume with the size of the volume. I'm not completely sure I understand the need for that calculation.  Anyway....

Performance Advisor can create complex thresholds where multiple thresholds must be breached before an event is generated.  However, to the best of my knowledge, there is no ability to perform calculations WITHIN Performance Advisor.  However, you could do this outside of Performance Advisor using a scipt and the dfm CLI command.  Some ideas:

1)  Create a performance threshold on volume response time. Then, create an alarm for that threshold.  When the threshold is breached and the alarm is called, it could run a script which does the following:

  • determine which volume is the one that breached the threshold  (this is included in the event id)
  • We already know that response time for this volume has been breached. So we only need to collect the volume throughput and volume size to complete you calculation
  • dfm perf data retrieve -o <volume-id> -C volume:total_ops -d 120              # get the current number of total IOPS for the volume and put it into a variable
  • dfm report view volumes-capacity <volume-id>                                  # get the current size of the volume and place it into a variable
  • vol total_ops / volume capacity = normalized throughput
  • if normalized throughput > xxx, then take necessary action

adaikkap
3,210 Views

As Earls, already said, it not possible to do it within PA. But DFM as a whole offers an automation framework called alarms script which gets executed when an event condition is breach. This script provide environment variables and can be used to script what you like. The below FAQ give an simple example of the environment variables and its details.

https://library.netapp.com/ecmdocs/ECMM1278650/html/faq/index.shtml#_7.5

Note this is how the script must be listed in the alarm script. It should have the full path of the script interpreter.

lnx~ # dfm alarm list

Alarm 1

Group Global

Event Severity All

Event Name All

Time From 00:00

Time To 23:59

Email Addresses

Script Name  /usr/bin/perl  /opt/script.pl

User Script Runs As root

You can also generate a custom event of your own based on the script using dfm event generate

Regards

adai

schepers
3,210 Views

I had a short break, therefore it took some time for a response.

The question is meant to solve a problem in a shared storage environment where there are different users with different workloads.

Why per throughput per GB (or any other capacity measure)?

Well imagine an aggregate build with e.g. 600 GB/SAS drives in a shared storage environment. The end-user ordered a volume of let's say 10% of the size of that aggregate, then if one use a fair share policy, that user has also access to 10% of the IOPS (or whatever throughput measure).

No body will complain if the average response time stays within the agreed boundaries, but if it goes over you want to know which volume/user/application caused the (too) high workload. When identified, this volume/user/application should order or more storage or be moved to a higher performing storage tier.

Does that make sense?

Yann

Public