ONTAP Discussions

Alert: Data ONTAP: Controller Latency Monitor Resolution state

anders_hansen
6,315 Views

Hi

We are using the OnCommand management pack for MS SCOM. We are getting alerts like this:

Alert: Data ONTAP: Controller Latency Monitor

Source: Controller na06-cn2

Path:

Last modified by: System

Last modified time: 26-12-2011 21:07:25

Alert description: Please see the alert context for details.

Alert view link: "?DisplayMode=Pivot&AlertID=%7b720a99cd-80f6-4546-bbcc-2d29320c2fb1%7d"

Notification subscription ID generating this message: {EE248E23-BEB6-364C-3965-9B5F72B8EAE9}

What does that alert cover?

I how do i fix it?

The system is not overloaded, using about 10% CPU - has 2 4243 shelves with 600GB sas disk doing about 1000 iops in total. Its a HA-system so half the disks are assigned to each controller.

I cant see anything with a "sysstat -x 1"

1 ACCEPTED SOLUTION

watan
6,315 Views

Check out the OCPM IAG.  The IAG has more details on the monitors and you can customize the overrides to match your environment or completely disable.  Keep in mind that if you disable them you will not get any alerts if an actual issue arrises.

Data ONTAP:

Volume Latency Rule

This rule triggers an alert based on when the average volume latency

exceeds a critical threshold. This rule runs by default every 30 minutes.

Data ONTAP: LUN

Latency Rule

This rule triggers an alert based on when the average LUN latency exceeds

a critical threshold (default 500ms and 1000ms). This rule runs by default

every hour.

I couldn't find anything about the Controller latency monitor but there is a report which should be similar.

Data ONTAP

Controller Average

System Latency Report

Displays the average I/O or network latency of the top five userconfigurable

storage controllers over a specific period of time. The

default time period is from the first day of the month to the current day.

This report assists you to determine if you can load-balance more

effectively.

View solution in original post

5 REPLIES 5

jgunnarson
6,315 Views

Did you ever figure anything out?  We are having the same issue.  I also get a similar error on the LUNS.  I have a support ticket in as well.

BGARDNER2001
6,315 Views

Same issue here. Did you guys ever figure this out? I’m thinking doing an override in SCOM, but not really sure what else that could affect.

watan
6,316 Views

Check out the OCPM IAG.  The IAG has more details on the monitors and you can customize the overrides to match your environment or completely disable.  Keep in mind that if you disable them you will not get any alerts if an actual issue arrises.

Data ONTAP:

Volume Latency Rule

This rule triggers an alert based on when the average volume latency

exceeds a critical threshold. This rule runs by default every 30 minutes.

Data ONTAP: LUN

Latency Rule

This rule triggers an alert based on when the average LUN latency exceeds

a critical threshold (default 500ms and 1000ms). This rule runs by default

every hour.

I couldn't find anything about the Controller latency monitor but there is a report which should be similar.

Data ONTAP

Controller Average

System Latency Report

Displays the average I/O or network latency of the top five userconfigurable

storage controllers over a specific period of time. The

default time period is from the first day of the month to the current day.

This report assists you to determine if you can load-balance more

effectively.

anders_hansen
6,315 Views

Thank you for clearing that up

BGARDNER2001
6,315 Views

Here is what I’ve found on the subject so far.

 

In SCOM, under monitoring, storage systems, performance,
latency. I had spikes at specific times. These spikes correspond to snapshot
backups that were happening at midnight. Since these spikes (ranging from 6ms
to 13ms) only last for a minute or two and were during non-production hours,
they are not really a concern for us. Because they are annoying, I did want to experiment
with override values for these alerts.

The easiest way I found to do override, is first figure out
what the number is that is generating the alerts. To do this, in SCOM, go under Monitoring,
Storage Systems, Dashboard and you should see a pane that is labeled Alerts. This
should show the LUN and Controller alerts for the past several days. Right click
on an alert, go to Properties and view the Alert Context. Take a note af the
Value. This should be the reported latency in Ms for that alert. Look at
several of them and note the number. For me, I found inconsistencies in the
numbers. Sometimes a .2 was sent as a critical and a 1.8 as a warning. I’m still
trying to figure this one out.

 

So to set the actual override; in the same Alerts pane,
right click again on the alert, click Overrides, then Override the Monitor and then
click For all objects of class: Data ONTAP LUN or for a Group. I don’t know if
it matters which one you choose so you may want to experiment with this. You should
now see a Lower Threshold and Upper Threshold. The default values are 5 and 10,
which appears to be equivalent to .5ms and 1ms. I selected an existing override
management pack to store the overrides in - as recommended by NetApp
documentation. If you don’t have one you can click on new to create one. Now based
on your values that you got from your alerts, set your override values. I
clicked the enforce button as well, though I don’t really know if it makes a
difference.

This might not be a solution but I hope it help save someone
some time in trying to figure this stuff out.

 

If someone finds more info on the subject, please update
this post.

Public