Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Recently there is been a couple of request on what do we monitor and alert in OM for C-Mode.
Below is the list of new events that we generate for C-Mode Specific Objects(Starting OM 4.0 and later)
Below is the same in table format with eventclass.
Event Name | Severity | Class |
cluster-discovered | Information | cluster.discovered |
cluster-node-added | Information | cluster.node.added |
cluster-node-removed | Information | cluster.node.removed |
cluster-reachable | Normal | ping.status |
cluster-renamed | Information | cluster.renamed |
cluster-unreachable | Critical | ping.status |
port-role:changed | Information | port.roleChange |
port-status:down | Error | port.status |
port-status:undef | Normal | port.status |
port-status:unknown | Normal | port.status |
port-status:up | Normal | port.status |
logical-interface-status:down | Error | lif.status |
logical-interface-status:unknown | Normal | lif.status |
logical-interface-status:up | Normal | lif.status |
logical-interface:migrated | Warning | lif.migration |
vserver-deleted | Information | vserver.deleted |
vserver-discovered | Information | vserver.discovered |
vserver-renamed | Information | vserver.renamed |
vserver-running | Information | vserver.running |
vserver-stopped | Information | vserver.stopped |
Below is the list of commons event that are generated for C-mode objects as well as 7G/7Mode
Event Group | Events |
volume | volume-almost-full |
volume-clone:deleted | |
volume-clone:discovered | |
volume-full | |
volume-growth-rate:abnormal | |
volume-growth-rate:ok | |
volume-new-snapshot | |
volume-offline-or-destroyed | |
volume-online | |
volume-snapshot-deleted | |
volume-space-normal | |
inodes-almost-full | |
inodes-full | |
inodes-utilization-normal | |
Aggregate | aggregate-almost-full |
aggregate-almost-overcommitted | |
aggregate-full | |
aggregate-not-overcommitted | |
aggregate-overcommitted | |
aggregate-snapshot-reserve-almost-full | |
aggregate-snapshot-reserve-full | |
aggregate-snapshot-reserve-ok | |
aggregate-space-normal | |
aggregate:deleted | |
aggregate:discovered | |
aggregate:failed | |
aggregate:offline | |
aggregate:online | |
aggregate:restricted | |
NVRAM | nvram-battery:discharged |
nvram-battery:fully-charged | |
nvram-battery:low | |
nvram-battery:missing | |
nvram-battery:normal | |
nvram-battery:old | |
nvram-battery:overcharged | |
nvram-battery:replace | |
nvram-battery:unknown-status | |
cpu | cpu-load-normal |
cpu-too-busy | |
Enclosures | enclosures-active |
enclosures-disappeared | |
enclosures-failed | |
enclosures-found | |
enclosures-inactive | |
enclosures-ok | |
Fans | fans:many-failed |
fans:normal | |
fans:one-failed | |
Host | host-discovered |
host-down | |
host-login:failed | |
host-login:ok | |
host-snmp-not-responding | |
host-snmp-ok | |
host-up | |
host:identity-conflict | |
host:identity-ok | |
host:name-changed | |
host:system-id-changed | |
Power supplies | power-supplies:many-failed |
power-supplies:normal | |
power-supplies:one-failed | |
Snapshots | snap-count:exceeded |
snap-count:ok | |
snapshot-full | |
snapshot-space-ok | |
snapshots:disabled | |
snapshots:enabled | |
snapshots:not-too-old | |
snapshots:too-old | |
Environmentals | temperature-hot |
temperature-normal |
Regards
adai.
Hi,
I have questions about some of the events listed above...
[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.
>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?
[Info] Cluster Renamed (cluster-renamed): A cluster got renamed.
>>> Is this event is after the node is renamed but before the reboot then the event should be marked as pending. The action is not complete until the node is rebooted.
[Normal] Port Status Up (port-status-up): A cluster port status is up.
[Error] Port Status Down (port-status-down): A cluster port status is down.
[Normal] Port Status Undefined (port-status-undef): A cluster port status is undefined.
[Normal] Port Status Unknown (port-status-unknown): A cluster port status is unknown.
[Info] Port Role Changed (port-role-changed): A cluster port role has changed.
>>> Do we have similar events for ports that are in other roles?
Hi Mrinal Devadas,
Find my answers inline (in italic prefixed with [Saravanan]) :
[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.
>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?
[Saravanan] Till DFM 4.0.1, you can use either 'cluster-mgmt' LIF or node-mgmt LIF as your primary address for monitoring. You can always switch over if one address is not reachable using "dfm host set <hostid> hostprimaryaddress=<>"
[Info] Cluster Renamed (cluster-renamed): A cluster got renamed.
>>> Is this event is after the node is renamed but before the reboot then the event should be marked as pending. The action is not complete until the node is rebooted.
[Saravanan] I dont think so reboot is one of the mandatory requirement for renaming feature. If so, please share some data. I'll verify the same in DFM and let you know.
[Normal] Port Status Up (port-status-up): A cluster port status is up.
[Error] Port Status Down (port-status-down): A cluster port status is down.
[Normal] Port Status Undefined (port-status-undef): A cluster port status is undefined.
[Normal] Port Status Unknown (port-status-unknown): A cluster port status is unknown.
[Info] Port Role Changed (port-role-changed): A cluster port role has changed.
>>> Do we have similar events for ports that are in other roles?
[Saravanan] Its for all the ports. No limiations to roles.
[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.
>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?
I would suggest:
1. You can configure an alternate-Ip address for Cluster Management Lif Or
2. If your Cluster-mgmt Lif is not reachable due to port down/node rechability issues then you can always configure a failover policy for this Lif so that it can failover to any port within cluster which is available.
Thank you for the answers.