ONTAP Discussions

Cluster alert: partner mailbox disks not accessible or invalid

HUGOSECO01
10,700 Views

Hi all,

My name is Hugo and this is my first time with this community. ;-).

We have two FAS3050 in cluster mode. We could see some days ago this message in the second Node:

Fri Jun 14 08:00:04 MEST [netapp2: statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : partner mailbox disks not accessible or invalid

but in Node1 every thing it´s ok. No problems detected with the cluster.

Any idea about how can we resolved this situation?? If Node1 go down we suppose that Node2 cann´t change to active state.

Thanks a lot, Hugo.

7 REPLIES 7

bondbhola
10,700 Views

Please share the output of below command

cf monitor all

Thanks,

Bhola Gond

HUGOSECO01
10,700 Views

This is from Node1:

netapp1> cf monitor all

cf: Current monitor status (19Jun2013 14:40:29):

partner 'netapp2', VIA Interconnect is up (link 0 up, link 1 up)

state UP, time 49476116655, event CHECK_FSM, elem ChkMbValid (12)

mirrorConsistencyRequired TRUE

takeoverByPartner 0x12000 <TAKEOVER_ON_PANIC>

mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE

degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE

timeouts:

    fast 1000, slow 2500, mailbox 10000, connect 5000

    operator 600000, firmware 10000 (recvd 49476116655), dumpcore 60000

    booting 300000 (recvd 0)

    transit timer enabled TRUE, transit 600000 (last 73383)

mailbox disks:

Disk 0a.17 is a local mailbox disk

Disk 0c.21 is a local mailbox disk

Disk 0a.22 is a local mailbox disk

Disk 0c.22 is a local mailbox disk

Disk 0b.17 is a partner mailbox disk

Disk 0d.22 is a partner mailbox disk

Disk 0b.16 is a partner mailbox disk

Disk 0d.16 is a partner mailbox disk

primary state:

        version 2, senderSysid 101176010

        cluster_time 1291254368, hbt 108343995, node_status TAKEOVER_ENABLED

        info 0x12000 <TAKEOVER_ON_PANIC>

        flags 0x0 <>

        channel CHANNEL_MAILBOX, abs_time 1371645626, sk_time 49476114655

        channel_status 0

        channel CHANNEL_IC, abs_time 1371645628, sk_time 49476116655

        channel_status 0

        channel CHANNEL_NETWORK, abs_time 0, sk_time 0

        channel_status -1

backup state:

        version 2, senderSysid 101165807

        cluster_time 1291254368, hbt 108608956, node_status TAKEOVER_ENABLED

        info 0x12000 <TAKEOVER_ON_PANIC>

        flags 0x0 <>

        channel CHANNEL_MAILBOX, abs_time 1371645628, sk_time 49476116459

        channel_status 0

        Channel Read Ctx:

        version 2, senderSysid 101165807

        cluster_time 1291254368, hbt 108608956, node_status TAKEOVER_ENABLED

        info 0x12000 <TAKEOVER_ON_PANIC>

        flags 0x0 <>

        channel CHANNEL_IC, abs_time 1371645628, sk_time 49476116655

        channel_status 0

        Channel Read Ctx:

        version 2, senderSysid 101165807

        cluster_time 1291254368, hbt 108608955, node_status TAKEOVER_ENABLED

        info 0x12000 <TAKEOVER_ON_PANIC>

        flags 0x0 <>

        channel CHANNEL_NETWORK, abs_time 0, sk_time 0

        channel_status -1

        Channel Read Ctx:

        version 2, senderSysid 0

        cluster_time 0, hbt 0, node_status UNKNOWN

        info 0x0 <>

        flags 0x0 <>

takeoverState FT_NONE, takeoverString 'No takeover information'

givebackState FT_NONE, givebackString 'No giveback information'

givebackRetries 0, givebackRequested FALSE

autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE

autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE

Maximum primary disk mailbox io times: normal = 5265, transition = 2055

Maximum backup disk mailbox io times: normal = 32919, transition = 94

Num times logs unsynced : 2

Total system uptime: 49476117292 msec

Unsync state total time : 6050892 msec

Unsync state  Max  time : 5461322 msec

  Sync state total time : 49468738613 msec

  Sync state  Max  time : 49456574037 msec

netapp1*>

And from NODE2 (with the Alert about cluster):

netapp2> cf monitor all

Usage: cf [disable|enable|forcetakeover [-[d][f]]|forcegiveback|giveback [-f]|monitor|partner|status [-t]|takeover [-f | -n]]

netapp2>

netapp2>

something is wrong??.

Thanks a lot Bhola !!!

martin_fisher
10,700 Views

can you upload the output of cf status from NetApp2? 

HUGOSECO01
10,700 Views

Hi Martin,

output from netapp2:

netapp2> cf status

netapp1 is up, takeover disabled because of reason (partner mailbox disks not accessible or invalid)

netapp2>

sharmadevender
10,700 Views

It it was working fine earlier. Verify all FC loop and for this BUG.

Bug Detail

 

Bug ID363611
Titletakeover disabled because of reason (partner mailbox disks not accessible or invalid)
Duplicate of
Bug Severity3 - Serious inconvenience
Bug StatusFixed
ProductData ONTAP
Bug TypeUnknown
Description
Formatted
 This condition can occur if the Fibre Channel HBA reports a LOOP_DOWN status  after attempting to issue an I/O operation on a path. Consequently, if the HBA  reports a LOOP_DOWN status then Data ONTAP may prematurely fail the I/O and  neglect to attempt the maximum number of retry operations permitted on each  available path. The defect observed in this report is due to such a failure,  Data ONTAP failed to complete a cluster level I/O operation successfully  causing clustering to become disabled.  
Workaround
Formatted
 There is no known workaround.  
Notes
Formatted
 
Fixed-In VersionA complete list of releases where this bug is fixed is available here.

Hope this will help.

martin_fisher
10,700 Views

Agreed - Check your cabling - makes sure all the SFP/ACP cables are connected.

HUGOSECO01
10,700 Views

Thanks!!!  we are checking cable conexions.....

I will tell you the results.

Public