ONTAP Discussions

important-events

TMADOCTHOMAS
6,211 Views

NetApp best practices, at least in earlier versions, recommend configuring event notifications on a cluster with the "important-events" filter. After upgrading from OnTAP 9.3 to 9.5 last week, I've noticed that this mechanism has become much more chatty, and not in a good way. I now get two "NOTICE" severity alerts every time I create a new LUN, for example. One says :

 

------------------------------

LUN.nvfail.option.on: The nvfail option for the volume <volume> has been turned on. The volume contains LUN or NVMe namespaces and it is recommended that this option be kept on for such volumes.

-------------------------------

 

The other says:

 

-------------------------------

LUN.space.resv.not.honored: Space reservations in volume idc_v_pobdb01t_dblogs1 (DSID 5359) are not honored, either because the volume space guarantee is set to 'none' or it is disabled due to lack of space in its containing aggregate.

-------------------------------

 

Any ideas on how to turn off particular annoying alerts like this but keep the others?

1 ACCEPTED SOLUTION

TMADOCTHOMAS
5,726 Views

Thanks @Mjizzini . It turns out that somewhere along the way, this filter lost it's connection to Active IQ UM so it was sort of acting as a 'standalone' filter anyway. I removed it and tried to re-add it from AIQUM but it didn't work. I'll figure that out later. Thanks again.

View solution in original post

8 REPLIES 8

Mjizzini
6,170 Views

you can modify the security filter rules to get less alerts (EMERGENCY, ALERT, ERROR, NOTICE). 

Event filter rule add

How to configure event filter and event notification for email destination

 

 

TMADOCTHOMAS
6,142 Views

Thank you @Mjizzini ! After checking, my current settings appear to be the default. Unless I'm misreading the 'exclude' line, NOTICEs shouldn't generate email alerts but they are. Am I misunderstanding it?

 

Also, there is actually ONE notice type, for autogrow, that I *do* want to get. I've been happy to see those alerts pop up so I'm aware of them. 

 

Should I delete the exclude rule and replace it with one that specifies NOTICES and other informational alerts, but put a new INCLUDE rule further up that includes the one for autogrow? Will that do the trick?

 

TMADOCTHOMAS_0-1605796836826.png

 

ttran
6,053 Views

Hi Tmadocthomas,

 

You can add an exclude rule for NOTICE messages to the filter "important-events" and re-position the rule to the top. We don't recommend this as self-correcting events like under/over temperature or under-voltage from power outage events come through as NOTICE, i.e. chassisTemperature.ok. Excluding those self-correcting messages might cause you to think an issue is still on-going and have to investigate it. Perhaps creating a rule and moving NOTICE alerts it to a folder is a better approach just in case you do need to verify something.

 

Here is an example of what an exclude rule for NOTICE events should look like in your configuration:

cluster::> event filter show
Filter Name Rule     Rule      Message Name           SNMP Trap Type          Severity
            Position Type
----------- -------- --------- ---------------------- ---------------         --------
important-events
            1.       exclude   *                      *                       NOTICE
            2        include   *                      *                       EMERGENCY, ALERT
            3        include   callhome.*             *                       ERROR
            4        exclude   *                      *                       *
            

 

If you are still running issues after modification of the filter, you could be encountering bug: 1045538 - EMS notification configuration is not synchronized across all nodes where the configuration isn't getting propagated to all nodes.

 

The workaround for this is:

::> set advanced

::*> event config force-sync -node <node-name-not-applying-rule>

 

 

Regards,

 

Team NetApp

Team NetApp

TMADOCTHOMAS
6,006 Views

Thank you @ttran ! If this is still a best practice I don't want to change anything, however I'm not following why the two specific messages I mentioned in my primary post are being sent. They are purely informational, essentially telling me something I already know, and don't appear to add any value. Any thoughts on these two messages in particular? This only started after we upgraded from 9.3 to 9.5.

ttran
5,988 Views

Hi Tmadocthomas,

 

I tested the "exclude" functionality in the lab and it is working as designed, where the two NOTICE level messages are excluded from the filter "important-events." Something you can check is if you have the filter "no-info-debug-events" configured to send events, as that does include severity level NOTICE.

 

You can view all your enabled filters using:

::*> event notification show
ID   Filter Name                     Destinations
---- ------------------------------  -----------------
1    default-trap-events             snmp-traphost
2    important-events                snmp-traphost
2 entries were displayed.

 

Here are my configured event filters:

::*> event filter show
Filter Name Rule     Rule      Message Name           SNMP Trap Type  Severity
            Position Type
----------- -------- --------- ---------------------- --------------- --------
default-trap-events
            1        include   *                      *               EMERGENCY, ALERT
            2        include   callhome.*             *               ERROR
            3        include   *                      Standard, Built-in *
            4        exclude   *                      *               *
important-events
            1        include   *                      *               EMERGENCY, ALERT
            2        include   callhome.*             *               ERROR
            3        exclude   *                      *               *
no-info-debug-events
            1        include   *                      *               EMERGENCY, ALERT, ERROR, NOTICE
            2        exclude   *                      *               *
test1
            1        include   *                      *               EMERGENCY, ALERT, ERROR
            2        exclude   *                      *               *
11 entries were displayed.

 

You can test a specific event if it will be included or excluded against a specific filter using the following:

 

::*> event filter test -filter-name default-trap-events -message-name LUN.nvfail.option.on
The message-name "LUN.nvfail.option.on" is excluded from the given filter.

::*> event filter test -filter-name default-trap-events -message-name LUN.space.resv.not.honored
The message-name "LUN.space.resv.not.honored" is excluded from the given filter.

::*> event filter test -filter-name important-events -message-name LUN.nvfail.option.on
The message-name "LUN.nvfail.option.on" is excluded from the given filter.

::*> event filter test -filter-name important-events -message-name LUN.space.resv.not.honored
The message-name "LUN.space.resv.not.honored" is excluded from the given filter.

 

The same tests for both LUN events provided against filter "no-info-debug-events" and the result is "included":

::*> event filter test -filter-name no-info-debug-events -message-name LUN.nvfail.option.on
The message-name "LUN.nvfail.option.on" is included in the given filter.

::*> event filter test -filter-name no-info-debug-events -message-name LUN.space.resv.not.honored
The message-name "LUN.space.resv.not.honored" is included in the given filter.

 

Here is a reference document for "event filter test" command: 

Event filter test 

 

 

Regards,

 

Team NetApp

 

Team NetApp

TMADOCTHOMAS
5,968 Views

@ttran this was super-helpful!! Thank you very much. My settings match yours, except I have one extra filter that also is included in the NOTICE alerts. This is the one that is generated when adding EMS events from OnCommand Unified Manager (or now ActiveIQ Unified Manager):

 

CITHQIFNACL01P-EBSCO-COM_filter
1 include objstore.host.* * *
2 include objstore.interclusterlifDown
* *
3 include cloud.aws.* * *
4 include qos.monitor.memory.* * *
5 include NVMeNS.* * *
6 include fg.space.member.* * *
7 include fg.inodes.member.* * *
8 include arl.netra.ca.check.* * *
9 include gb.netra.ca.check.* * *
10 include monitor.vol.* * *
11 include sms.* * *
12 include wafl.vol.autoSize.* * *
13 include LUN.* * *
14 include Nblade.* * *
15 include cifs.shadowcopy.* * *
16 include fabricpool.* * *
17 include osc.signatureMismatch * *
18 include nvmf.graceperiod.* * *
19 exclude * * *

 

The "LUN.*" entry is what's doing it of course. It also shows up in no-info-debug-events, however I have no destinations set for that one so I think I'm getting the alerts from the Unified Manager one.

 

So: I can delete that filter, but I do like being aware of truly critical EMS events. Any suggestions? Should I remove this from OCUM and then re-create it under 9.5 to see if it "cleans it up"?

Mjizzini
5,735 Views

Removing it from OCUM and then re-create it under 9.5 will be a good troubleshooting step.

TMADOCTHOMAS
5,727 Views

Thanks @Mjizzini . It turns out that somewhere along the way, this filter lost it's connection to Active IQ UM so it was sort of acting as a 'standalone' filter anyway. I removed it and tried to re-add it from AIQUM but it didn't work. I'll figure that out later. Thanks again.

Public