Cluster alert: partner mailbox disks not accessible or invalid

HUGOSECO01 · ‎2013-06-19

Hi all,

My name is Hugo and this is my first time with this community. ;-).

We have two FAS3050 in cluster mode. We could see some days ago this message in the second Node:

Fri Jun 14 08:00:04 MEST [netapp2: statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : partner mailbox disks not accessible or invalid

but in Node1 every thing it´s ok. No problems detected with the cluster.

Any idea about how can we resolved this situation?? If Node1 go down we suppose that Node2 cann´t change to active state.

Thanks a lot, Hugo.

bondbhola · ‎2013-06-19

Please share the output of below command

cf monitor all

Thanks,

Bhola Gond

HUGOSECO01 · ‎2013-06-19

This is from Node1:

netapp1> cf monitor all

cf: Current monitor status (19Jun2013 14:40:29):

partner 'netapp2', VIA Interconnect is up (link 0 up, link 1 up)

state UP, time 49476116655, event CHECK_FSM, elem ChkMbValid (12)

mirrorConsistencyRequired TRUE

takeoverByPartner 0x12000 <TAKEOVER_ON_PANIC>

mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE

degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE

timeouts:

fast 1000, slow 2500, mailbox 10000, connect 5000

operator 600000, firmware 10000 (recvd 49476116655), dumpcore 60000

booting 300000 (recvd 0)

transit timer enabled TRUE, transit 600000 (last 73383)

mailbox disks:

Disk 0a.17 is a local mailbox disk

Disk 0c.21 is a local mailbox disk

Disk 0a.22 is a local mailbox disk

Disk 0c.22 is a local mailbox disk

Disk 0b.17 is a partner mailbox disk

Disk 0d.22 is a partner mailbox disk

Disk 0b.16 is a partner mailbox disk

Disk 0d.16 is a partner mailbox disk

primary state:

version 2, senderSysid 101176010

cluster_time 1291254368, hbt 108343995, node_status TAKEOVER_ENABLED

info 0x12000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1371645626, sk_time 49476114655

channel_status 0

channel CHANNEL_IC, abs_time 1371645628, sk_time 49476116655

channel_status 0

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

backup state:

version 2, senderSysid 101165807

cluster_time 1291254368, hbt 108608956, node_status TAKEOVER_ENABLED

info 0x12000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1371645628, sk_time 49476116459

channel_status 0

Channel Read Ctx:

version 2, senderSysid 101165807

cluster_time 1291254368, hbt 108608956, node_status TAKEOVER_ENABLED

info 0x12000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_IC, abs_time 1371645628, sk_time 49476116655

channel_status 0

Channel Read Ctx:

version 2, senderSysid 101165807

cluster_time 1291254368, hbt 108608955, node_status TAKEOVER_ENABLED

info 0x12000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

Channel Read Ctx:

version 2, senderSysid 0

cluster_time 0, hbt 0, node_status UNKNOWN

info 0x0 <>

flags 0x0 <>

takeoverState FT_NONE, takeoverString 'No takeover information'

givebackState FT_NONE, givebackString 'No giveback information'

givebackRetries 0, givebackRequested FALSE

autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE

autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE

Maximum primary disk mailbox io times: normal = 5265, transition = 2055

Maximum backup disk mailbox io times: normal = 32919, transition = 94

Num times logs unsynced : 2

Total system uptime: 49476117292 msec

Unsync state total time : 6050892 msec

Unsync state Max time : 5461322 msec

Sync state total time : 49468738613 msec

Sync state Max time : 49456574037 msec

netapp1*>

And from NODE2 (with the Alert about cluster):

netapp2> cf monitor all

netapp2>

something is wrong??.

Thanks a lot Bhola !!!

martin_fisher · ‎2013-06-19

can you upload the output of cf status from NetApp2?

HUGOSECO01 · ‎2013-06-20

Hi Martin,

output from netapp2:

netapp2> cf status

netapp1 is up, takeover disabled because of reason (partner mailbox disks not accessible or invalid)

netapp2>

sharmadevender · ‎2013-06-20

It it was working fine earlier. Verify all FC loop and for this BUG.

Bug Detail

Bug ID	363611
Title	takeover disabled because of reason (partner mailbox disks not accessible or invalid)
Duplicate of
Bug Severity	3 - Serious inconvenience
Bug Status	Fixed
Product	Data ONTAP
Bug Type	Unknown
Description Formatted	This condition can occur if the Fibre Channel HBA reports a LOOP_DOWN status after attempting to issue an I/O operation on a path. Consequently, if the HBA reports a LOOP_DOWN status then Data ONTAP may prematurely fail the I/O and neglect to attempt the maximum number of retry operations permitted on each available path. The defect observed in this report is due to such a failure, Data ONTAP failed to complete a cluster level I/O operation successfully causing clustering to become disabled.
Workaround Formatted	There is no known workaround.
Notes Formatted
Fixed-In Version	Data ONTAP 7.3.3RC1 (First Fixed) - Fixed Data ONTAP 7.3.3 (GD) - Fixed Data ONTAP 7.3.7 (GA) - Fixed Data ONTAP 7.3.7P1 (GA) - Fixed Data ONTAP 8.0.5 (GA) - Fixed Data ONTAP 8.2 (GA) - Fixed A complete list of releases where this bug is fixed is available here.

Hope this will help.

martin_fisher · ‎2013-06-20

Agreed - Check your cabling - makes sure all the SFP/ACP cables are connected.

HUGOSECO01 · ‎2013-06-21

Thanks!!! we are checking cable conexions.....

I will tell you the results.