2011-06-09 02:09 PM
My client wants to use an alarm, set to repeat notify, to create pages in a 3rd party app. They are going to set this 3rd party app to wait for the 3rd alarm from us before paging an administrator.
The first instance we are trying to cover is "SnapShot Full".
When measuring how many minutes 2 repeats (first alarm, plus 2 repeats equals 3 alarms) will cover, its important to know the polling interval DFM is using to gather this data.
Its not clear whether SnapShot full is being polled under the "Disk Free Space" monitor or the "File Systems" monitor (or the SnapShots monitor, but I think that only populates the list of snapshots, not their disk usage). By default the "Disk Free Space" is a 15 minute poll, which would yield between 31 and 45 minutes of delay between the first alarm, and the page to the admin. The "File Systems" monitor is 30 minutes, which would yield between 61 and 90 minutes. Those are large enough differences to be of concern to the customer, and changing the intervals for one monitor or the other to match could have large consequences in our large-scale environment.
Is there a document that maps DFM events to the monitor that polls them? NOW and Wikid searches are getting me very far.
2011-06-09 09:29 PM
Your concern is absolutely valid.
There are more that 400 events and about 37 monitoring controls. It would of good use to know which monitoring control affects what events.
But I don't think there exists any document or wiki page or any formally put page which directly maps events to the monitors. Atleast I myself haven't seen any. If its not there maybe we should make one.
I'll update you on this soon.
2011-06-13 12:13 PM
So I set up a short PERL script which ran a report we alreaday have in our environment which lists all the SnapReserve values for a storage appliance. I targetted a specific volume, and set my script to repeat that poll every 30 seconds.
I then did several specific snapshot deletions and tracked when those snap deletions were reflected in a lower value on the SpanReserve report. I ran the test 5 times, and on two occasions came up with values larger than 15 minutes ( 19 and 24 mins respectively). That leads me to believe that SnapShot Full events are being driven by the diskFreeSpace monitor, which defaults to a 30 minute polling interval.
2011-06-13 09:13 PM
Good find matthew. We are preparing a document which will be helpful for looking this information up. It will be available very soon.
2011-06-13 09:34 PM
Discovery of snapshots, like new, and deletion of discovered snapshot is done by snapshot monitor.
Any space related info is done by diskfreespace monitor.
Discover of Aggr, volume, qtree are done by fs mon.
vfiler by vfiler mon
lun by lun mon.
And disk utilisation of all this is done by df mon and qtree utilisation by quota mon.(if quotas are set)