Solved: Re: FCVI Source Identifier Message

mr · ‎2011-11-29

Hello,

I installed a Fabric Attached Metro Cluster last week with DOT8.0.2P3. At a boot/reboot and at a giveback I see some strange messages.

I know, it is normal at boot/giveback, that a reset of a FCVI Card can happend. But following message is strange: "Source identifier for FCVI interconnect message (0) is invalid; dropping the message. Check cabling for a port mismatch." I checked the cables and switches, everything looks good.

Here is the output during a giveback...

filer01(takeover)*> cf giveback

filer01(takeover)*> Tue Nov 29 09:54:58 CET [filer01 (takeover): cf.misc.operatorGiveback:info]: Failover monitor: giveback initiated by operator

Tue Nov 29 09:54:58 CET [filer01: cf.fm.givebackStarted:notice]: Failover monitor: giveback started

CIFS partner server is shutting down...

CIFS partner server has shut down...

Tue Nov 29 09:54:59 CET [filer02/filer01: ftpd.service.off:info]: The FTP subsystem stopped.

Tue Nov 29 09:54:59 CET [filer02/filer01: fcp.service.shutdown:info]: FCP service shutdown

Aggregate aggr0 may not be dissolved because of 1 snapshot reference

Tue Nov 29 09:55:00 CET [filer01: monitor.globalStatus.ok:info]: This node has taken over filer02.

Tue Nov 29 09:55:02 CET [filer01: cf.rsrc.transitTime:notice]: Top Giveback transit times raid=2714, wafl=397 {giveback_sync=225, sync_clean=147, forget=21, finish=4, mark_abort=0, wait_offline=0, wait_create=0, vol_refs=0, abort_scans=0, drain_msgs=0}, wafl_gb_sync=193, registry_giveback=27, nfsd=16, sanown_replay=10, snapvault=4, java=3, nlm=1, ifconfig=1

Tue Nov 29 09:55:02 CET [filer01: callhome.sfo.giveback:info]: Call home for CONTROLLER GIVEBACK COMPLETE

Tue Nov 29 09:55:02 CET [filer01: cf.fm.givebackComplete:notice]: Failover monitor: giveback completed

Tue Nov 29 09:55:02 CET [filer01: cf.fm.givebackDuration:notice]: Failover monitor: giveback duration time is 4 seconds

Tue Nov 29 09:55:02 CET [filer01: cf.fsm.stateTransit:info]: Failover monitor: TAKEOVER --> UP

Tue Nov 29 09:55:02 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (unsynchronized log).

Tue Nov 29 09:55:05 CET [filer01: cf.fm.timeMasterStatus:info]: Acting as time master

Tue Nov 29 09:55:06 CET [filer01: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of filer02 disabled (interconnect error).

Tue Nov 29 09:55:06 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (interconnect error).

Tue Nov 29 09:55:07 CET [filer01: cf.ic.invalidSrcAddr:error]: Source identifier for FCVI interconnect message (0) is invalid; dropping the message. Check cabling for a port mismatch.

Tue Nov 29 09:55:09 CET [filer01: cf.hwassist.emptyPrtnrAddr:warning]: Partner address is empty; set it using the command 'options cf.hw_assist.partner.address' and to make the hardware-assisted takeover work.

Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:b4:e6', address 0x40400.

Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:b4:e2', address 0x30400.

Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:1a', address 0x30500.

Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:cf:24', address 0x20300.

Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:cf:14', address 0x40500.

Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:18', address 0x10300.

Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:34', address 0x10600.

Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:b4:d8', address 0x20600.

RDB-HA ending primary

Tue Nov 29 09:55:14 CET [filer01: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of filer02 disabled (unsynchronized log).

Tue Nov 29 09:55:14 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (unsynchronized log).

Tue Nov 29 09:55:15 CET [filer01: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of filer02 enabled

Tue Nov 29 09:55:22 CET [filer01: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of filer01 by filer02 enabled

filer01*> cf monitor all

cf: Current monitor status (29Nov2011 09:55:26):

partner 'filer02', VIA Interconnect is up (link 0 up, link 1 up)

HA Interconnect Device is FCVI ( Qlogic 2532 )

state UP, time 488100, event CHECK_FSM, elem ChkMbValid (13)

mirrorConsistencyRequired TRUE

takeoverByPartner 0x2000 <TAKEOVER_ON_PANIC>

mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE

degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE

hw_assist status:

hw_assist Active on emausstore01. emausstore01 monitoring alerts from partner emainstore01

hw_assist Active on emainstore01: emainstore01 monitoring alerts from partner emausstore01

timeouts:

fast 1000, slow 2500, mailbox 10000, connect 5000

operator 600000, firmware 15000 (recvd 488100), dumpcore 60000

booting 300000 (recvd 0)

transit timer enabled TRUE, transit 600000 (last 464213)

mailbox disks:

Disk brc300_backend_sw01:2.19 is a local mailbox disk

Disk brc300_backend_sw01:2.20 is a local mailbox disk

Disk brc300_backend_sw03:2.35 is a local mailbox disk

Disk brc300_backend_sw04:2.36 is a local mailbox disk

Disk brc300_backend_sw04:6.18 is a partner mailbox disk

Disk brc300_backend_sw03:6.19 is a partner mailbox disk

Disk brc300_backend_sw01:6.34 is a partner mailbox disk

Disk brc300_backend_sw02:6.35 is a partner mailbox disk

primary state:

version 2, senderSysid 1574032162

cluster_time 1322556589, hbt 218, node_status TAKEOVER_ENABLED

info 0x2000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1322556925, sk_time 487100

channel_status 0

channel CHANNEL_IC, abs_time 1322556926, sk_time 488100

channel_status 0

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

backup state:

version 2, senderSysid 1574032729

cluster_time 1322556589, hbt 93579, node_status TAKEOVER_ENABLED

info 0x2000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1322556923, sk_time 485941

channel_status 0

Channel Read Ctx:

version 2, senderSysid 1574032729

cluster_time 1322556589, hbt 93577, node_status TAKEOVER_ENABLED

info 0x2000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_IC, abs_time 1322556926, sk_time 488100

channel_status 0

Channel Read Ctx:

version 2, senderSysid 1574032729

cluster_time 1322556589, hbt 93579, node_status TAKEOVER_ENABLED

info 0x2000 <TAKEOVER_ON_PANIC>

flags 0x0 <>

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

Channel Read Ctx:

version 2, senderSysid 0

cluster_time 0, hbt 0, node_status UNKNOWN

info 0x0 <>

flags 0x0 <>

takeoverState FT_NONE, takeoverString 'No takeover information'

givebackState FT_NONE, givebackString 'No giveback information'

givebackRetries 0, givebackRequested FALSE

autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE

autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE

Maximum primary disk mailbox io times: normal = 237, transition = 0

Maximum backup disk mailbox io times: normal = 562, transition = 0

Num times logs unsynced : 0

Total system uptime: 488331 msec

Sync state total time : 39297 msec

Sync state Max time : 35369 msec

filer01*> cf status

Cluster enabled, filer02is up.

Has anyone a idea?!?

mr · ‎2012-02-27

hi,

its a bug, which is fixed with DOT8.0.2P4

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=448454

Regards

Markus

View solution in original post

proactsupportnl · ‎2012-02-26

Reviving this old thread

We had that same issue today as well. When the taken over filer was booted to "Waiting for giveback" this message was displayed. However giveback was succesful and afterwards cf status and cf monitor showed all okay.

Does anyone have any ideas?

mr · ‎2012-02-27

hi,

its a bug, which is fixed with DOT8.0.2P4

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=448454

Regards

Markus