Active IQ Unified Manager Discussions

Custom email notification DFM 5.2 alarm scripts

hadrian
6,031 Views

Hey World,

Has anyone integrated custom email notification scripts with DFM 5.x or 4.x and would like to share their work thus far?  I am looking at creating some alarms in DFM and instead of an email recipient I want it to call a script which will evaluate certain criteria (such as time of day, event content) and then email the user if criteria is met.  Not looking to re-invent the wheel as someone has probably written this before.

Thanks,

Hadrian

7 REPLIES 7

JGPSHNTAP
6,031 Views

I've been down this route once before.. Word of caution, i would use only system based events because let's say volume threshold events could come in multiple triggers and it will continue to trigger the script.  My experiment failed because I was trying to automatically trigger perf stats based on custom thresholds I put in place.  That didn't end well..  What happened was it was triggering perfstat script like every 10 seconds against the same controller.. So I failed miserably..

hadrian
6,031 Views

Hi Thanks for the feedback. Here is what I’m trying to accomplish:

Performance thresholds for latency or IOPS are breached, an event occurs.

The alarm is tied to a performance breach event and kicks off the script

The scripts sees what time of day it is and decides to send the email or not

You may note that there is a time filter property of dfm alarms already, but apparently it only works for repeating notifications, not just the first notification.

Hadrian

GLENYU5820
6,031 Views

Hi Hadrian,

I have a simple script running at the Linux server for my 7-mode and c-mode storage. The script itself just to kick off perfstat to gather filer performance data.

I depend on the performance advisor alarms to trigger the script which does not check the time of day before sending an email alert.

At the performance advisor, I setup a threshold of latency of 20 ms and CPU is over 70%.

Thanks

Glen

JGPSHNTAP
6,031 Views

Glen - Thats cool.   but what happens if you get multiple alerts on the same filer, multiple perfstats... ? That was my issue

HENRYPAN2
6,031 Views

Cool Glen,

Is your wonder script also run on WinServ 2012 as well?

Cheers

Henry

GLENYU5820
6,031 Views

Hi JG,

Before the current script finishes, any new breach would not kick off any new perfstat collection.

Hi Henry,

Don't think my Linux script would run at windows.

Thanks

Glen

rmatsumoto
6,031 Views

Are you kicking off perfstat collection to be reviewed by someone other than you?  Also, are you wanting this because perf issue is more critical during certain hours and warrants further review, or do you just want a record of it?

The reason for the first question is, if the review is dependent exclusively on the counters, then DFM should be able to collect most, if not all, of them, provided your filers are running 7-mode.  So you can just rely on the threshold for the alerts, and look at the counter data later and that data retention is controlled by something other than a person managing perfstat file(s).  If you're needing perfstat for in-depth review by you or NetApp then I'd think there's no real substitute for collecting a perfstat, but our goal was to generally get away from having to collect it by collecting the counter data we rely on within DFM so that we can do our normal checks quickly and that has provided pretty good value.  Also, this has helped with decision-making outside of break-fix in terms of purchasing decision & provisioning decision.  We've gotten pretty good at avoiding creating a hot aggr, knowing when to spot one, when to turn off dedupe, POC/bake-off, etc.  And we can pretty much address most of the problems on our own. 

For the 2nd question, if you just want to know when it happened and you want to keep track of the hours on which the alerts are generated, you can do this from DFM as well.  We actually do this to map out which filer generates X number of alerts on each hour of the day.  How we do this is by putting each filer in its own group, create a threshold, and run this command(with a powershell filter at the end.  replace with grep if unix, I guess).  Again, if you need perfstat for in depth review then I don't think this is a substitute:

dfm event list -g GROUP_ID | select-string "name of your threshold" | select-string breached

That counts the number of times the object(filer) in that GROUP_ID has breached the threshold.  The output will also have timestamps for the last 6 months(that's if you're using 4.0.2.  I heard this format changes in newer version).  65th & 66th characters on each line will have the hours so this will give you the hours of threshold breach:

dfm event list -g GROUP_ID | select-string "name of your threshold" | select-string "breached"| %{$_.tostring().substring(64,2)}

Put the output in Excel and have it count the number of times 00(that's 12am)-23 shows up and you have the hours that generate the most alerts for that group. 

Public