What Events does Operations Manger Generate for C-Mode(8.0/8.0.1) in 4.0/4.0.1

adaikkap · ‎2011-01-14

Recently there is been a couple of request on what do we monitor and alert in OM for C-Mode.

Below is the list of new events that we generate for C-Mode Specific Objects(Starting OM 4.0 and later)

[Info] Cluster Discovered (cluster-discovered): A cluster was discovered.
[Normal] Cluster Reachable (cluster-reachable): A cluster was reachable from DataFabric Manager network.
[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.
[Info] Cluster Renamed (cluster-renamed): A cluster got renamed.
[Info] Cluster Node Added (cluster-node-added): A node was added to a cluster.
[Info] Cluster Node Removed (cluster-node-removed): A node was removed from a cluster.
[Normal] Port Status Up (port-status-up): A cluster port status is up.
[Error] Port Status Down (port-status-down): A cluster port status is down.
[Normal] Port Status Undefined (port-status-undef): A cluster port status is undefined.
[Normal] Port Status Unknown (port-status-unknown): A cluster port status is unknown.
[Info] Port Role Changed (port-role-changed): A cluster port role has changed.
[Normal] Logical Interface Status Up (logical-interface-status-up): A logical interface status is up.
[Error] Logical Interface Status Down (logical-interface-status-down): A logical interface status is down.
[Normal] Logical Interface Status Unknown (logical-interface-status-unknown): A logical interface status is unknown.
[Warning] Logical Interface Migrated (logical-interface-migrated): A logical interface migrated to a different node.
[Info] Vserver Discovered (vserver-discovered): A vserver was discovered.
[Info] Vserver Deleted (vserver-deleted): A vserver was deleted.
[Info] Vserver Renamed (vserver-renamed): A vserver was renamed.

Below is the same in table format with eventclass.

Event Name	Severity	Class
cluster-discovered	Information	cluster.discovered
cluster-node-added	Information	cluster.node.added
cluster-node-removed	Information	cluster.node.removed
cluster-reachable	Normal	ping.status
cluster-renamed	Information	cluster.renamed
cluster-unreachable	Critical	ping.status
port-role:changed	Information	port.roleChange
port-status:down	Error	port.status
port-status:undef	Normal	port.status
port-status:unknown	Normal	port.status
port-status:up	Normal	port.status
logical-interface-status:down	Error	lif.status
logical-interface-status:unknown	Normal	lif.status
logical-interface-status:up	Normal	lif.status
logical-interface:migrated	Warning	lif.migration
vserver-deleted	Information	vserver.deleted
vserver-discovered	Information	vserver.discovered
vserver-renamed	Information	vserver.renamed
vserver-running	Information	vserver.running
vserver-stopped	Information	vserver.stopped

Below is the list of commons event that are generated for C-mode objects as well as 7G/7Mode

Event Group	Events
volume	volume-almost-full
	volume-clone:deleted
	volume-clone:discovered
	volume-full
	volume-growth-rate:abnormal
	volume-growth-rate:ok
	volume-new-snapshot
	volume-offline-or-destroyed
	volume-online
	volume-snapshot-deleted
	volume-space-normal
	inodes-almost-full
	inodes-full
	inodes-utilization-normal
Aggregate	aggregate-almost-full
	aggregate-almost-overcommitted
	aggregate-full
	aggregate-not-overcommitted
	aggregate-overcommitted
	aggregate-snapshot-reserve-almost-full
	aggregate-snapshot-reserve-full
	aggregate-snapshot-reserve-ok
	aggregate-space-normal
	aggregate:deleted
	aggregate:discovered
	aggregate:failed
	aggregate:offline
	aggregate:online
	aggregate:restricted
NVRAM	nvram-battery:discharged
	nvram-battery:fully-charged
	nvram-battery:low
	nvram-battery:missing
	nvram-battery:normal
	nvram-battery:old
	nvram-battery:overcharged
	nvram-battery:replace
	nvram-battery:unknown-status
cpu	cpu-load-normal
	cpu-too-busy
Enclosures	enclosures-active
	enclosures-disappeared
	enclosures-failed
	enclosures-found
	enclosures-inactive
	enclosures-ok
Fans	fans:many-failed
	fans:normal
	fans:one-failed
Host	host-discovered
	host-down
	host-login:failed
	host-login:ok
	host-snmp-not-responding
	host-snmp-ok
	host-up
	host:identity-conflict
	host:identity-ok
	host:name-changed
	host:system-id-changed
Power supplies	power-supplies:many-failed
	power-supplies:normal
	power-supplies:one-failed
Snapshots	snap-count:exceeded
	snap-count:ok
	snapshot-full
	snapshot-space-ok
	snapshots:disabled
	snapshots:enabled
	snapshots:not-too-old
	snapshots:too-old
Environmentals	temperature-hot
	temperature-normal

Regards

adai.

mrinal · ‎2011-06-21

Hi,

I have questions about some of the events listed above...

[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.

>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?

[Info] Cluster Renamed (cluster-renamed): A cluster got renamed.

>>> Is this event is after the node is renamed but before the reboot then the event should be marked as pending. The action is not complete until the node is rebooted.

[Normal] Port Status Up (port-status-up): A cluster port status is up.

[Error] Port Status Down (port-status-down): A cluster port status is down.

[Normal] Port Status Undefined (port-status-undef): A cluster port status is undefined.

[Normal] Port Status Unknown (port-status-unknown): A cluster port status is unknown.

[Info] Port Role Changed (port-role-changed): A cluster port role has changed.

>>> Do we have similar events for ports that are in other roles?

msaravan · ‎2011-06-21

Hi Mrinal Devadas,

Find my answers inline (in italic prefixed with [Saravanan]) :

[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.

>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?

[Saravanan] Till DFM 4.0.1, you can use either 'cluster-mgmt' LIF or node-mgmt LIF as your primary address for monitoring. You can always switch over if one address is not reachable using "dfm host set <hostid> hostprimaryaddress=<>"

[Info] Cluster Renamed (cluster-renamed): A cluster got renamed.

>>> Is this event is after the node is renamed but before the reboot then the event should be marked as pending. The action is not complete until the node is rebooted.

[Saravanan] I dont think so reboot is one of the mandatory requirement for renaming feature. If so, please share some data. I'll verify the same in DFM and let you know.

[Normal] Port Status Up (port-status-up): A cluster port status is up.

[Error] Port Status Down (port-status-down): A cluster port status is down.

[Normal] Port Status Undefined (port-status-undef): A cluster port status is undefined.

[Normal] Port Status Unknown (port-status-unknown): A cluster port status is unknown.

[Info] Port Role Changed (port-role-changed): A cluster port role has changed.

>>> Do we have similar events for ports that are in other roles?

[Saravanan] Its for all the ports. No limiations to roles.

tulsiraj · ‎2011-06-21

[Critical] Cluster Not Reachable (cluster-unreachable): A cluster was not reachable from DataFabric Manager network.

>>> Does this refer to the 'cluster-mgmt' LIF? Can we set the node-mgmt LIF to be used in case the cluster-mgmt LIF is not available?

I would suggest:

1. You can configure an alternate-Ip address for Cluster Management Lif Or

2. If your Cluster-mgmt Lif is not reachable due to port down/node rechability issues then you can always configure a failover policy for this Lif so that it can failover to any port within cluster which is available.

mrinal · ‎2011-06-22

Thank you for the answers.