Active IQ Unified Manager Discussions

How to Temporarily Disable Operations Manager (DFM) Alerts for a Specific Host

MRJORDANG
6,113 Views

Hello,

I have several new NetApp's that are being setup and configured.   I'd like to add them to Operations Manager (DFM) right away however these NetApp's will be tested thoroughly which includes reboots, cf takeovers/givebacks, LUN's going off line and other operations I would typically like to be alerted for with the exception of filers that are being setup and configured.

Is there any way to add a filer to operations manager but flag the filer as being in "maintenance mode" within Operations Manager?   The goal is to have operations manager track this filer and gather data but also to tell operations manager not to alert us on significant events.

Thanks,

Jordan

8 REPLIES 8

pradeepl
6,116 Views

Hi Jordan,

You can achieve this by

1) Create a resource group and add the filer to the same.

2) Create an alarm for this resource group say for event severity Critical

3) Now disable this alarm.

Now you will achieve your goal of to have operations manager track this filer and gather data but not to alert on significant events.

Once the maintenance is complete you can enable the alarm.

Hope this helps.

Regards

Pradeep L

MRJORDANG
6,113 Views

Thanks for the suggestion.  I'll give it a shot!

I was hoping there would be simple way to flag the filer but I'll give this a shot as well.

Thanks for the response!

pradeepl
6,113 Views

Hi,

One more solution to Temporarily disable Operations Manager Alerts for a Specific Host is

dfm host delete <host-id/name>,  and once the maintenance is done you can add the host again using dfm host add <host-id/name>.

But the problem here is monitoring wont happen during the above period means operations manager wont track this filer and wont gather any data when the host is deleted.

Btw did the previous approach worked?

ijm2024
5,762 Views

Wish i had seen the dfm host delete section, it appears that moving into new group and setting critical alarm for that group to ignore does not work since there is a top-level global 'All -Critical or Worse alarm that catches it from Global. Would have had to create new Critical_per_Subgroup_Alarm (i.e. CRITICAL_SUB1, CRITICAL_SUB2, etc) and move hosts into sub group, and enable Critical alarm for each SUB group and then be able to disable the top-level All-Critical, as those would be caught by the sub group critical alarms set. More work that way. Or in addition could add Maintenance_SUB and move filer into there, but would still need to create other SUB groups for other criticals so they are caught, and disable top. Unless theres another way. But the dfm host delete may be the better solution, given the hierarchy for the alarms .

JamesIlderton
5,747 Views

I'd suggest creating 2 top-level groups - Production and Maintenance.  Then you can assign alerts to the Production group only and move systems over to the Maintenance group to avoid alerts while still capturing data.  Using alerts from the root is a bad idea in general.

ijm2024
5,743 Views

Thanks, wondering why its bad to use alerts from root? Looking to present to management other option. Thanks in advance-

JamesIlderton
5,741 Views

If you base your alerts on the Global "root" level, you can't stop alerts like you were asking about.  Also, as you grow you can use nested groups to control who gets what level of alerts based on site, application or other groups - while maintaining higher-level alerts and allowing for maintenance/lab groups for no alerting.  Remember that you can create groups of Volumes, LUNs, Resource Pools, Aggregates, Qtrees, etc. to help narrow the focus (i.e. send alerts on Exchange resources to your Exchange admins).  DFM is VERY noisey so the more yoiu can intelligently group your resources the better you can isolate alerts to what is needed so they are actionable and not just another email that gets filed (or deleted) and overlooked.

 

Unfortunately OnCommand Unified Manager for Clustered Data ONTAP (DFM 6) doesn;t work the same way, the alert settings have resources individually rather than using groups so they cannot be easily moved into maintenance like this.  You're forced to setup your alerting very granular so you can disable an individual alert that contains the resources going under maintenance.

 

We're looking to move alerrting to SolarWinds SRM now to have better control of this and of the content of the alert messages.  It will still use OC Core/OCUM for the data source, but we can "unmanage" a system during maintenance (and even schedule it in advance!)  Plus the added advantage of using the AppStack to see upstream systems (VMware, UCS, Switches, Routers, Applications, etc.) could give us a true single pane of glass for the environment in general.

ijm2024
5,737 Views

Thanks James!

Public