I am working with a customer on understanding how DFM does it's monitoring leading to alerts. The customer would like to know the metric, threshold and frequency that are gathered and compared for each of the DFM alerts that are listed in the attached spreadsheet. (The attached spreadsheet is built from a 'dfm eventtype list') An example might be an event called "Aggregate Full"
- Event: Aggregate Full
- Metric : Aggr full or aggr percent utilized (some specific value or number of values computed)
- Threshold : 95%
- frequency : 5 min interval
I do not believe there is an easy answer since an eventtype might be based on computations using a number of variables and we don't define those explicitly. As for those with thresholds we are aware of these:
[root@admin man1]# dfm options list | grep -i threshold
aggrFullThreshold 90
aggrNearlyFullThreshold 80
... etc
As for the frequency that information is polled for such alerts, it would seem to me to come from these interval settings:
[root@admin man1]# dfm options list | grep -i interval
agentMonInterval 2 minutes
autosupportMonInterval 2 minutes
... etc
But I am not sure, and don't know how to determine, what actual values are captured at these intervals. For example, sysInfoMonInterval shows as "1 hour" on a dfm server I am looking at. What data elements are gathered at 1-hour interval and what event-types use those elements? I do not know how to anwer that. Another example might be the event "cfo-interconnct:down". How can I determine what rate dfm polls for that? Is it part of the data collected once an hour or the sysInfoMonInterval or something else.
Is there any place where event-types are mapped back to data-elements and those data-elements matched to one of the dfm polling intervals I see in "dfm options list|grep -i interval".
Another dimension to the question is what method is used to gather this information for each of the intervals listed. I've been told that dfm uses snmp and zapi (ONTAP API) to gather info but nothing I see explains which is used for which. Is that documented somewhere?
Hopefully I've explained what we are looking for. I don't expect it to be easy but I wonder if there are some hidden xml or config files that would show us answers, or partial answers, to some of these.