Subscribe

Normal polling cycle log example for MCTB 2.4.0

Now that the TCP connect checks for MCTB has been deferred until other CFOD criteria have been met, the logging for a normal polling cycle looks different.  Here's an example of a healthy polling cycle using MCTB version 2.4.0:

2014-05-01 11:16:50,451 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking site reachability    The polling cycle always starts with this message

2014-05-01 11:16:50,451 [Site1-ping-1] DEBUG com.netapp.rre.bautils.NetUtils - Pinging: 10.61.167.1       Ping messages are only present if Gateways are configured

2014-05-01 11:16:51,450 [Site1-ping-1] DEBUG com.netapp.rre.bautils.NetUtils -   10.61.167.1 reachable

2014-05-01 11:16:51,450 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Updating filer status

2014-05-01 11:16:51,559 [filerStatusUpdate-1] DEBUG com.netapp.rre.anegada.Filer - TS6080-1: cf-status CONNECTED, Enabled: true, IC: true

2014-05-01 11:16:51,559 [filerStatusUpdate-2] DEBUG com.netapp.rre.anegada.Filer - TS6080-2: cf-status CONNECTED, Enabled: true, IC: true

2014-05-01 11:16:51,559 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking interconnects

2014-05-01 11:16:51,559 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Getting aggr status for TS6080-1

2014-05-01 11:16:51,699 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-1: loading aggregate mirror status

2014-05-01 11:16:51,699 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-1:     aggr2: mirrored

2014-05-01 11:16:51,699 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-1:     aggr0: mirrored

2014-05-01 11:16:51,699 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-1:     aggr1: mirrored

2014-05-01 11:16:51,699 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Getting aggr status for TS6080-2

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-2: loading aggregate mirror status

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-2:     aggr1: mirrored

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-2:     aggr0: mirrored

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Filer - TS6080-2:     aggr2: mirrored

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking CFOD for filer TS6080-1

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Survivor CF state: CONNECTED

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Survivor IC: true

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Root Aggr Mirror Degraded: false

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking CFOD for filer TS6080-2

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Survivor CF state: CONNECTED

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Survivor IC: true

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor -    Root Aggr Mirror Degraded: false

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking mirror degradation on filer TS6080-1

2014-05-01 11:16:51,824 [TSMC-monitor] DEBUG com.netapp.rre.anegada.Monitor - Checking mirror degradation on filer TS6080-2   End of polling cycle

Additional log messages may appear depending on various settings and, of course, if any problems or CFOD related conditions exist.

Re: Normal polling cycle log example for MCTB 2.4.0

Hi,

I'm trying to write instructions for non-Netapp monitoring stuff to know how to check from logs MCTB status.

Is it possible to add line like "Site check done, CFO NOT needed"? Now there is clear start point on cycle, but not end point saying that something is ended.

K/\I

Re: Normal polling cycle log example for MCTB 2.4.0

HI K/\I,

There will always be "Checking site reachability" at the start of the polling cycle for each monitor ("TSMC-Monitor" in the example above).   That never changes.  However, depending on the conditions of the MetroCluster, the end of the polling cycle might have several outcomes.

If everything is normal, there will always be the two messages, Checking mirror degradation on filer, at the end of the cycle, one for each filer.  However, if gateways are configured and neither site can be reached, then the cycle ends when the "Isolation-Detected" event is posted (which is logged).   If either filer is in a Takeover or Giveback state, or CF has been disabled, then the cycle ends prematurely as well (there are specific log messages for these cases too).  Finally, if CFOD is initiated, then the cycle ends as well without checking for mirror degradation.

In each case, I would have to provide a different log message to indicate the end of the cycle.  In order to avoid checking for many different logging messages, it might be better to simply scan the log for everything between one "Checking site reachability" log message and the next, or the end of the file.  If you encounter the message "Initiating CFOD", then CFOD has happened, otherwise it has not.

Keep in mind that all abnormal conditions cause events to be sent to DFM, which can be configured to notify other applications by sending SNMP traps or email messages.  This can be configured under the "Setup->Alarms" menu and clicking the "Advanced Version" link next to the "Add" button in the lower right.   From there you can choose MetroCluster-TieBreaker specific events or even an entire class of events, and how to respond to them.    Using the "dfm" command line, you can also configure scripts to be executed when an event or class of events is received.

Hope this helps.

Brian