How To: Build a multilevel alert in Performance Advisor

by Frequent Contributor on ‎2012-02-23 02:56 PM - edited on ‎2014-09-25 01:26 PM by allison Former NetApp Employee

Updated: April 17, 2013 - updated templates attached, please update your systems.

 

How to: Build a multilevel alert in Performance Advisor

 

In this article I build on the excellent technical report TR-4090 Performance Advisor Features and Diagnosis (7-mode) which outlines what to monitor with Performance Advisor. If you have not seen this paper, I highly recommend you download it here. It will give you a sound methodology and explain which counters really matter.

 

 

 

Monitoring a performance counter with a threshold is useful, but the typical threshold has a single value and a single severity level: error.

 

In this article, I'll show you how to build multilevel thresholds as defined in the Performance Advisor Default Thresholds document and assign alarms to those so you get notified as each is breached.

 

Building and applying a multilevel threshold is a 4 step process:

  1. Build the template
  2. Set the severity levels
  3. Apply the template to objects to monitor
  4. Create alarms for notification

First, decide what you want to monitor. In our case, I'll use the latency for the CIFS protocol and build a single threshold template with three distinct thresholds, each with a different severity level. Here are the thresholds and the severities we'll be using.

 

Counter

Threshold

Severity

cifs:cifs_latency

> 15ms over 5 minutes

warning

cifs:cifs_latency

> 20ms over 5 minutes

error

cifs:cifs_latency

> 40ms over 5 minutes

critical

 

Step 1: Build the template

 

Login to the NetApp Management Console and select Performance Advisor. Go to the bottom of the navigation panel and click Setup. In the upper portion of the panel, click Threshold Templates. Click the Add icon in the main panel to start the Add Threshold Template Wizard.

 

 

Use the Next button to step through the wizard, completing this information:

 

Name:  NA_CIFS_Latency

Description:  CIFS Latency > 15/20/40ms over 5 min

Threshold Interval(seconds):  300

 

Event Name:  CIFS_Latency_warn

Object:  cifs

Counter:  cifs_latency

Type:  upper

Value:  15

Unit:  millisec

 

Event Name:  CIFS_Latency_error

Object:  cifs

Counter:  cifs_latency

Type:  upper

Value:  20

Unit:  millisec

 

 

Event Name:  CIFS_Latency_critical

Object:  cifs

Counter:  cifs_latency

Type:  upper

Value:  40

Unit:  millisec

 

 

When completed, your template should look like this:

 

Step 2: Set the Severity Levels

 

When you create an event in a template Performance Advisor assigns it a severity of error. You need to modify the warning and critical level events so that they have the correct severity level. To do this, we need to go to the command line on the OnCommand server.

 

Each performance 'event' consists of two conditions: a normal condition and a breached condition. The normal condition is the expected state, that is not an error, while the breached condition signifies the abnormal condition for which you are monitoring.

 

To set the correct severity we will modify the breached condition of each event using the dfm eventType command. The syntax of the command is:

 

    dfm eventType modify -v <event-severity> <event-name>

 

where

 

     <event-name> = <event type>:<event>:<condition>

 

In our case, the <event type> is perf since this is a performance event, the event is the name we created, and the condition is breached. The commands to modify our warning and critical events to the correct severity are:

 

C:\>dfm eventType modify -v critical perf:CIFS_Latency_critical:breached

Modified event "perf:CIFS_Latency_critical:breached".

 

C:\>dfm eventType modify -v warning perf:CIFS_Latency_warn:breached

Modified event "perf:CIFS_Latency_warn:breached".

 

The commands to verify our modifications are:

 

C:\>dfm eventType list perf:CIFS_Latency_warn:breached

Event Name                                         Severity     Class

-------------------------------------------------- ------------ ------------------

perf:CIFS_Latency_warn:breached               Warning      perf:CIFS_Latency_warn

 

C:\>dfm eventType list perf:CIFS_Latency_critical:breached

Event Name                                         Severity     Class

-------------------------------------------------- ------------ ------------------

perf:CIFS_Latency_critical:breached           Critical     perf:CIFS_Latency_critical

 

If you are creating a series of these events, create a batch file or shell script to make the severity changes for you, it will make things go much easier and quicker.

 

 

Step 3: Apply the threshold template to what you want to monitor

 

We designed this template to monitor latency for the CIFS protocol, which is a controller-level value. Our next step is to apply the template to the controllers we wish to monitor.

 

Select the threshold template NA_ CIFS_Latency, right click, and choose Objects from the drop down menu. In the Objects pane, select the controllers to monitor in the left panel and use the > button to move the controller to the right panel.  Click the OK button to begin monitoring the controllers for the threshold.

Now the system will generate a warning, error, or critical event entry in the log each time the threshold is breached, and a normal event when the counter returns below the threshold.

 

OK, we're almost done, you only have one more step!

 

Step 4: Add Alarms for the Events

 

Now that you are monitoring the thresholds and generating events in the logs, one step remains - telling the system for which events you need alarms (notifications) generated.

 

In the Setup navigation panel, select Thresholds. In the main panel select the threshold you want to generate alarms.

 

 

Hint: you can use the filter in the 'Event Name' and 'Object' columns to quickly locate the events. For our events, enter 'CIFS' in the filter box and the display will show only events beginning with 'CIFS'.

 

Once you have selected the threshold, right-click and choose Add Alarm from the drop down menu. On the Add Alarm Wizard, fill in the field to notify via e-mail, pager (SMS messaging), a script to execute, or a SNMP host to send a trap to. Click Next to move to the next panel, then enter the time range for the alert to be active, and whether to do repeat notifications. Click Next one more time, review your choices, then click Finish to save the alarm. Congratulations, you're now set to monitor and alert.

 

 

Quick Start

 

As a quick start, I have uploaded a pre-built set of multilevel threshold templates for OnCommand 5.0 that implement basic multilevel thresholds as outlined in the Thresholds document mentioned at the beginning of this post. The file BasicMultiThresholdTemplates.zip contains these thresholds, ready to import into Performance Advisor

 

Updated: Two changes as of April 17, 2013.

 

The NA_CPU_Busy_HA and NA_CPU_Busy_SGL templates are updated to use the "avg_processor_busy" counter which presents a more accurate picture of controller CPU utilization.

 

Threshold Template

Description

NA_CIFS_Latency

CIFS Latency > 15/20/40ms over 5 min

NA_CPU_Busy_HA

CPU busy > 50/70/90 Pct on controller in HA-Pair for 5 min

NA_CPU_Busy_SGL

CPU busy > 60/70/80 Pct on single controller system for 5 min

NA_DISK_Busy

Check for disks busy > 60/70/90 percent

NA_FCP_Latency

FCP avg latency > 10/20/30ms over 5 min

NA_ISCSI_Latency

iSCSI average latency > 15/20/30ms over 5min

NA_LUN_Latency

LUN average read/write latencies > 20/30/40ms for 5 min

NA_MAX_Disk_Busy

Maximum disk busy > 80/90/98 for 5 minutes

NA_NFSV3_Latency

NFS v3 average latency > 15/20/40ms over 5min.

NA_NFSV4_Latency

NFS v4 average latency > 15/20/40ms over 5min.

NA_SYS_Avg_Latency

Average latency across controller for all operations, 20/30/40ms

 

These are based on the monitoring thresholds outlined in the Thresholds document. They do not include application specific thresholds, but based on this article you should be able to create those very easily.

 

Good Luck and happy monitoring!

Comments
Frequent Contributor

Great stuff!

Especially the the pre-build templates are very useful! Installed them right-away.

adaikkap Former NetApp Employee

Phil,

          I am going to incorporate these in to the doc as well..........Smiley Happy. Hope you dont mind.

Frequent Contributor

Not at all, I consider it an honor. I did not have time to finish building out the full set of templates to include Oracle, Exchange, etc. I also did not have time to build a simple script the user could run to change the severity levels. Maybe at a later date.

reide Former NetApp Employee

Good stuff.  Thanks to Adai, Nagendra and Phil we're finally going to have an up-to-date document on performance monitoring.  There is a lot of demand for this in the field.

Hello Guys,

I couldn't find the NA_Max_Disk_Busy.xml template, is this normal?

Also, I try to use this counter system:max_disk_busy in Performance Advisor graphics, but cannot retrieve any data collection. 

Thanks for your help.

Best regards,

Yannick

I am not able to download the document:

Performance Advisor Default Thresholds paper by Adaikkappan Arumugam and Nagendra Krishnappa

via the KB link posted by bachman.  Is this a NetApp internal only KB?

Also, are the BasicMultiThresholdTemplates.zip for OnCommand 5 compatible with DFM 4.0.2D2 ?

Frequent Contributor

Rick,

My mistake, the link was to the KB, which has been marked as internal. I have updated the links in the blog to point directly to the Default Thresholds document. Let me know if you are able to download it.

Phil Bachman

Frequent Contributor

Rick,

The templates in the zip file were built using OnCommand 5.0. I'll check to see if the format of the XML files changed from OpsManager 4.0.x.

Phil

Thanks Phil!

Your revised link to the Performance Advisor Default Thresholds document works for me now.

Rick

I exported a threshhold template I created in DFM 4.0.2D2 and at a quick glance, the XML format looks identical to yours created in OnCommand 5.0.

One (possibly dumb) question - I see your site specific filer objects in the XML, which I assume would just error out on import into a different site, but what about the DFM objectIDs?  Would they potentially conflict with existing objectIDs in a different site DFM database, or do they get adjusted on import?

ZANDER_MEARS

This is really useful, thanks for the info!

regards

Zander

adaikkap Former NetApp Employee

Thresholds templates are backward compatible.

BTW Rick dont worry about the DFM object ids its not a problem. I have already verified it.

Regards

adai

Hi Phil and Adai,

I am using OCUM v5.2 for c-mode, and do not see thresholds template option for c-mode, what is the solution for c-mode to setup multi-level alerts?

Thanks


Glen

adaikkap Former NetApp Employee

Hi Geln,

     Pls take a look at this What's New in OnCommand Unified Manager 5.1 Release.

As you would notice that this functionality is not available for PA in cluster Data ONTAP as specified in the table 2 of the above link.

Regards

adai

Hi Adai,

Understand that.

Do you mean the only choice for the cDOT is to wait the new version?

Thanks

Glen

adaikkap Former NetApp Employee

Yup..........Smiley Sad

Regards

adai

Hi Adai,

Got it!

Thanks

Glen

Phil, Great stuff. Keep it up.

Hello,

thank you for these tips.

When I try to follow the step 4, the alarm is created but the group for the newly created alarm is "Global" not the one concerned by the object indicated on the thresholds tab.

Is this a bug ?

Moreover, the alarm created by performance advisor appears without all the infos on the OnCommand Console. The events are empty while it appears correctly on Performance Advisor.

Thank you for your help.

For info, I Use OnCommand Unified Manager 5.2 and NetApp Management Console 3.3.

Warning!

This NetApp Community is public and open website that is indexed by search engines such as Google. Participation in the NetApp Community is voluntary. All content posted on the NetApp Community is publicly viewable and available. This includes the rich text editor which is not encrypted for https.

In accordance to our Code of Conduct and Community Terms of Use DO NOT post or attach the following:

  • Software files (compressed or uncompressed)
  • Files that require an End User License Agreement (EULA)
  • Confidential information
  • Personal data you do not want publicly available
  • Another’s personally identifiable information
  • Copyrighted materials without the permission of the copyright owner

Files and content that do not abide by the Community Terms of Use or Code of Conduct will be removed. Continued non-compliance may result in NetApp Community account restrictions or termination.