Subscribe
New Contributor

How to verify if the MCTB had issue forcetakeover

We have deployed MetroCluster with TieBreaker 2.4.0 on a customer site. When we did a site failover with power outage, the automatically takeover did not happen and we have to issue a cf forcetakeover command manually after waiting for some copule minutes. Afterwards I checked the controller and MCTB logs; and compared the logs of MetroCluster simulated in my lab, I found that the logs are different and seems the MCTB in customer site did not perform correctly.

My question are:

1.       1) How to verify if the MCTB had issue "cf forcetakeover -d" ?

          2) On controller message log, it seems that a takeover action happen and failed later, and then it asked it to giveback sometime later. Is this takeover is triggered by MCTB or by general Cluster Failover progress?

This is log on survival controller console

Tue Aug 26 19:01:23 HKT [Controller-B:cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(Controller-A), system_down because power_loss. 

Tue Aug 26 19:01:24 HKT [Controller-B:cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER 

Tue Aug 26 19:01:25 HKT [Controller-B:cf.fm.takeoverStarted:notice]: Failover monitor: takeover started 

Tue Aug 26 19:01:25 HKT [Controller-B:cf.ic.xferTimedOut:error]: WAFL interconnect transfer timed out 

Tue Aug 26 19:01:25 HKT [Controller-B:scsitarget.vtic.down:notice]: The VTIC is down. 

Tue Aug 26 19:01:25 HKT [Controller-B:rv.connection.torndown:info]: cfo_rv is torn down on NIC 0 

Tue Aug 26 19:01:36 HKT [Controller-B:scsi.path.excessiveErrors:error]: Excessive errors encountered by adapter 0c on disk device SKY_HP_8G80P_F01_LS3:30.126. 

Tue Aug 26 19:01:36 HKT [Controller-B:ses.access.noMoreValidPaths:CRITICAL]: No more valid paths to Enclosure Services in shelf 10 on channel SKY_HP_8G80P_F01_LS3:30. 

Tue Aug 26 19:01:36 HKT [Controller-B:ses.access.noMoreValidPaths:CRITICAL]: No more valid paths to Enclosure Services in shelf 11 on channel SKY_HP_8G80P_F01_LS3:30. 

Tue Aug 26 19:01:36 HKT [Controller-B:bridge.removed:info]: FC-to-SAS bridge SKY_HP_8G80P_F01_LS3:30.126L0 [ATTO     FibreBridge6500N 1.50] S/N [FB6500N114155] was removed. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L10 Shelf 10 Bay 9 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N17C1B0000B429AZGL] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:ds.sas.element.xport.error:error]: Shelf module A XPORT ERROR on channels SKY_HP_8G80P_F01_LS3:30/PARTNER shelf id 10 

Tue Aug 26 19:01:36 HKT [Controller-B:ds.sas.element.xport.error:error]: Shelf module A XPORT ERROR on channels SKY_HP_8G80P_F01_LS3:30/PARTNER shelf id 11 

:

:

Tue Aug 26 19:01:36 HKT [Controller-B:raid.vol.mirror.degraded:error]: Aggregate sata_aggr0 is mirrored and one plex has failed. It is no longer protected by mirroring. 

Tue Aug 26 19:01:36 HKT [Controller-B:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 

:

:

Tue Aug 26 19:01:36 HKT [Controller-B:fmmb.lock.disk.remove:info]: Disk ?.? removed from local mailbox set. 

Tue Aug 26 19:01:36 HKT [Controller-B:fmmb.current.lock.disk:info]: Disk GL_HP_8G80P_F01_LS3:54.126L1 is a local HA mailbox disk. 

Tue Aug 26 19:01:36 HKT [Controller-B:fmmb.current.lock.disk:info]: Disk GL_HP_8G80P_F01_LS3:54.126L2 is a local HA mailbox disk. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sata_aggr0/plex1/rg0/SKY_HP_8G80P_F01_LS3:30.126L30 Shelf 11 Bay 3 [NETAPP   X477_SMEGX04TA07 NA01] S/N [Z1Z1XB5W00009409MB7W] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L11 Shelf 10 Bay 10 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18XS00000B429BE21] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L1 Shelf 10 Bay 0 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N17CLT0000B429B98V] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.vol.mirror.degraded:error]: Aggregate sas_aggr0 is mirrored and one plex has failed. It is no longer protected by mirroring. 

Tue Aug 26 19:01:36 HKT [Controller-B:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L2 Shelf 10 Bay 1 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18XQM0000B429AAMC] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L3 Shelf 10 Bay 2 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18Y3D0000B429FAS2] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L4 Shelf 10 Bay 3 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N17C0T0000B429B8Y4] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L5 Shelf 10 Bay 4 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18XQ40000B429BE53] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L6 Shelf 10 Bay 5 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18YT60000B426DDYL] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L7 Shelf 10 Bay 6 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18XX30000B429DKSP] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L8 Shelf 10 Bay 7 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N17C1P0000B429BA36] is missing. 

Tue Aug 26 19:01:36 HKT [Controller-B:raid.config.filesystem.disk.missing:info]: File system Disk /sas_aggr0/plex3/rg0/SKY_HP_8G80P_F01_LS3:30.126L9 Shelf 10 Bay 8 [NETAPP   X423_SLTNG900A10 NA00] S/N [S0N18YEL0000B429FD21] is missing. 

Tue Aug 26 19:01:37 HKT [Controller-B:cf.fm.takeoverFailed:error]: Failover monitor: takeover failed 'Controller-B_19:33:22_2014:07:16' 

Tue Aug 26 19:01:37 HKT [Controller-B:cf.fm.givebackStarted:notice]: Failover monitor: giveback started.

  

      Hilson

2.      

Re: How to verify if the MCTB had issue forcetakeover

Hi Hilson

 

I guess you already fix that.

 

I always goto MCTB install directory and run:

 

perl mctb_dfm.pl showlog mctb.log

 

to check the mctb log.