Using the default settings of UM 5.1, in order to produce a host down event/alert the host would need to be down for a significant amount of time. This behavior was changed in UM 5.2 under bug 614983 (no public report at this time).
OnCommand Unified Manager Core uses five different methods to identify if a host is down:
- echo
- http
- snmp
- ndmp
- echo_snmp <== default
The default behavior for a host down monitor run is a ping using ICMP echo and then snmpwalk. UM will retry each method a pre-configured number of times with varying timeouts, as seen below.
While the ICMP retries and timeouts have remained the same over the 5.x code line, the SNMP timeouts were increased in UM 5.1 for 7DOT and even more for 5.1 cDOT installations.
Due to changes under bug 614983, if pingMonTimeout is set to less than or equal to 5 seconds, then the SNMP timeout for host down (pingmon) monitoring will be 5 seconds. If the pingMonTimeout is set to a value greater than 5 seconds, then the pingMonTimeout is used as the SNMP timeout. The global MonSNMPTimeout is used for all other SNMP connections. This applies to both 7DOTand cDOT versions of UM 5.2.
===============================================
UM 5.0.x default values:
monSNMPRetries 4
monSNMPTimeout 5
hostPingMethod echo_snmp
pingMonInterval 1 minute
pingMonRetryDelay 3
pingMonTimeout 3
===============================================
UM 5.1 7DOT default values:
monSNMPRetries 4
monSNMPTimeout 60
hostPingMethod echo_snmp
pingMonInterval 1 minute
pingMonRetryDelay 3
pingMonTimeout 3
===============================================
UM 5.1/5.2 cDOT default values:
monSNMPRetries 4
monSNMPTimeout 300
hostPingMethod echo_snmp
pingMonInterval 1 minute
pingMonRetryDelay 3
pingMonTimeout 3
===============================================
Therefore, if a clustered ONTAP controller is down for less than 5 minutes, UM 5.1 will not report it as down as it would not have exceeded the first timeout value for the host down check. If the ping method is changed to to echo or http the node down event is logged.
Changing the monSNMPTimeout to the 5.0.x default value of 5 seconds allows UM to determine the host down status with the echo_snmp method. However, it is not recommend that this value be adjusted lower than the default for cDOT UM 5.1 servers as some SNMP transactions can take a few minutes to complete and should not be sent multiple times under 5 minutes.