ONTAP Discussions
ONTAP Discussions
Hello,
I installed a Fabric Attached Metro Cluster last week with DOT8.0.2P3. At a boot/reboot and at a giveback I see some strange messages.
I know, it is normal at boot/giveback, that a reset of a FCVI Card can happend. But following message is strange: "Source identifier for FCVI interconnect message (0) is invalid; dropping the message. Check cabling for a port mismatch." I checked the cables and switches, everything looks good.
Here is the output during a giveback...
filer01(takeover)*> cf giveback
filer01(takeover)*> Tue Nov 29 09:54:58 CET [filer01 (takeover): cf.misc.operatorGiveback:info]: Failover monitor: giveback initiated by operator
Tue Nov 29 09:54:58 CET [filer01: cf.fm.givebackStarted:notice]: Failover monitor: giveback started
CIFS partner server is shutting down...
CIFS partner server has shut down...
Tue Nov 29 09:54:59 CET [filer02/filer01: ftpd.service.off:info]: The FTP subsystem stopped.
Tue Nov 29 09:54:59 CET [filer02/filer01: fcp.service.shutdown:info]: FCP service shutdown
Aggregate aggr0 may not be dissolved because of 1 snapshot reference
Tue Nov 29 09:55:00 CET [filer01: monitor.globalStatus.ok:info]: This node has taken over filer02.
Tue Nov 29 09:55:02 CET [filer01: cf.rsrc.transitTime:notice]: Top Giveback transit times raid=2714, wafl=397 {giveback_sync=225, sync_clean=147, forget=21, finish=4, mark_abort=0, wait_offline=0, wait_create=0, vol_refs=0, abort_scans=0, drain_msgs=0}, wafl_gb_sync=193, registry_giveback=27, nfsd=16, sanown_replay=10, snapvault=4, java=3, nlm=1, ifconfig=1
Tue Nov 29 09:55:02 CET [filer01: callhome.sfo.giveback:info]: Call home for CONTROLLER GIVEBACK COMPLETE
Tue Nov 29 09:55:02 CET [filer01: cf.fm.givebackComplete:notice]: Failover monitor: giveback completed
Tue Nov 29 09:55:02 CET [filer01: cf.fm.givebackDuration:notice]: Failover monitor: giveback duration time is 4 seconds
Tue Nov 29 09:55:02 CET [filer01: cf.fsm.stateTransit:info]: Failover monitor: TAKEOVER --> UP
Tue Nov 29 09:55:02 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (unsynchronized log).
Tue Nov 29 09:55:05 CET [filer01: cf.fm.timeMasterStatus:info]: Acting as time master
Tue Nov 29 09:55:06 CET [filer01: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of filer02 disabled (interconnect error).
Tue Nov 29 09:55:06 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (interconnect error).
Tue Nov 29 09:55:07 CET [filer01: cf.ic.invalidSrcAddr:error]: Source identifier for FCVI interconnect message (0) is invalid; dropping the message. Check cabling for a port mismatch.
Tue Nov 29 09:55:09 CET [filer01: cf.hwassist.emptyPrtnrAddr:warning]: Partner address is empty; set it using the command 'options cf.hw_assist.partner.address' and to make the hardware-assisted takeover work.
Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:b4:e6', address 0x40400.
Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:b4:e2', address 0x30400.
Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:1a', address 0x30500.
Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:cf:24', address 0x20300.
Tue Nov 29 09:55:09 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:cf:14', address 0x40500.
Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:18', address 0x10300.
Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2a' from '21:00:00:24:ff:32:cf:34', address 0x10600.
Tue Nov 29 09:55:10 CET [filer01: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '2b' from '21:00:00:24:ff:32:b4:d8', address 0x20600.
RDB-HA ending primary
Tue Nov 29 09:55:14 CET [filer01: cf.fsm.takeoverOfPartnerDisabled:notice]: Failover monitor: takeover of filer02 disabled (unsynchronized log).
Tue Nov 29 09:55:14 CET [filer01: cf.fsm.takeoverByPartnerDisabled:notice]: Failover monitor: takeover of filer01 by filer02 disabled (unsynchronized log).
Tue Nov 29 09:55:15 CET [filer01: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of filer02 enabled
Tue Nov 29 09:55:22 CET [filer01: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of filer01 by filer02 enabled
filer01*> cf monitor all
cf: Current monitor status (29Nov2011 09:55:26):
partner 'filer02', VIA Interconnect is up (link 0 up, link 1 up)
HA Interconnect Device is FCVI ( Qlogic 2532 )
state UP, time 488100, event CHECK_FSM, elem ChkMbValid (13)
mirrorConsistencyRequired TRUE
takeoverByPartner 0x2000 <TAKEOVER_ON_PANIC>
mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE
degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE
hw_assist status:
hw_assist Active on emausstore01. emausstore01 monitoring alerts from partner emainstore01
hw_assist Active on emainstore01: emainstore01 monitoring alerts from partner emausstore01
timeouts:
fast 1000, slow 2500, mailbox 10000, connect 5000
operator 600000, firmware 15000 (recvd 488100), dumpcore 60000
booting 300000 (recvd 0)
transit timer enabled TRUE, transit 600000 (last 464213)
mailbox disks:
Disk brc300_backend_sw01:2.19 is a local mailbox disk
Disk brc300_backend_sw01:2.20 is a local mailbox disk
Disk brc300_backend_sw03:2.35 is a local mailbox disk
Disk brc300_backend_sw04:2.36 is a local mailbox disk
Disk brc300_backend_sw04:6.18 is a partner mailbox disk
Disk brc300_backend_sw03:6.19 is a partner mailbox disk
Disk brc300_backend_sw01:6.34 is a partner mailbox disk
Disk brc300_backend_sw02:6.35 is a partner mailbox disk
primary state:
version 2, senderSysid 1574032162
cluster_time 1322556589, hbt 218, node_status TAKEOVER_ENABLED
info 0x2000 <TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_MAILBOX, abs_time 1322556925, sk_time 487100
channel_status 0
channel CHANNEL_IC, abs_time 1322556926, sk_time 488100
channel_status 0
channel CHANNEL_NETWORK, abs_time 0, sk_time 0
channel_status -1
backup state:
version 2, senderSysid 1574032729
cluster_time 1322556589, hbt 93579, node_status TAKEOVER_ENABLED
info 0x2000 <TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_MAILBOX, abs_time 1322556923, sk_time 485941
channel_status 0
Channel Read Ctx:
version 2, senderSysid 1574032729
cluster_time 1322556589, hbt 93577, node_status TAKEOVER_ENABLED
info 0x2000 <TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_IC, abs_time 1322556926, sk_time 488100
channel_status 0
Channel Read Ctx:
version 2, senderSysid 1574032729
cluster_time 1322556589, hbt 93579, node_status TAKEOVER_ENABLED
info 0x2000 <TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_NETWORK, abs_time 0, sk_time 0
channel_status -1
Channel Read Ctx:
version 2, senderSysid 0
cluster_time 0, hbt 0, node_status UNKNOWN
info 0x0 <>
flags 0x0 <>
takeoverState FT_NONE, takeoverString 'No takeover information'
givebackState FT_NONE, givebackString 'No giveback information'
givebackRetries 0, givebackRequested FALSE
autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE
autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE
Maximum primary disk mailbox io times: normal = 237, transition = 0
Maximum backup disk mailbox io times: normal = 562, transition = 0
Num times logs unsynced : 0
Total system uptime: 488331 msec
Sync state total time : 39297 msec
Sync state Max time : 35369 msec
filer01*> cf status
Cluster enabled, filer02is up.
Has anyone a idea?!?
Solved! See The Solution
hi,
its a bug, which is fixed with DOT8.0.2P4
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=448454
Regards
Markus
Reviving this old thread
We had that same issue today as well. When the taken over filer was booted to "Waiting for giveback" this message was displayed. However giveback was succesful and afterwards cf status and cf monitor showed all okay.
Does anyone have any ideas?
hi,
its a bug, which is fixed with DOT8.0.2P4
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=448454
Regards
Markus