Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've got a FAS2050 which is no longer covered by a service contract. It has dual controllers; however, clustering is refusing to work. After setting up the partner system ID's, etc, I've got it to the point where if I run 'cf enable', the cluster comes up, but the logs never sync up. One of the controllers (c1) throws some odd errors:
c1> cf enable
c1>
Tue Sep 8 21:57:39 GMT [c1: cf.misc.operatorEnable:warning]: Cluster monitor: operator initiated enabling of cluster
Tue Sep 8 21:57:39 GMT [c1: cf.fsm.takeoverOfPartnerDisabled:notice]: Cluster monitor: takeover of c0 disabled (cluster takeover disabled by partner)
Tue Sep 8 21:57:39 GMT [c1: cf.fsm.takeoverByPartnerDisabled:notice]: Cluster monitor: takeover of c1 by c0 disabled (unsynchronized log)
Tue Sep 8 21:57:40 GMT [c1: cf.nm.nicViError:info]: Interconnect nic 0 has error on VI #11 RECV_DESC_ERROR 2
Tue Sep 8 21:57:43 GMT [c1: cf.nm.nicViError:info]: Interconnect nic 0 has error on VI #11 RECV_DESC_ERROR 2
Tue Sep 8 21:57:45 GMT [c1: cf.fsm.takeoverOfPartnerDisabled:notice]: Cluster monitor: takeover of c0 disabled (unsynchronized log)
Tue Sep 8 21:57:47 GMT [c1: cf.nm.nicViError:info]: Interconnect nic 0 has error on VI #11 RECV_DESC_ERROR 2
Tue Sep 8 21:57:51 GMT [c1: cf.nm.nicViError:info]: Interconnect nic 0 has error on VI #11 RECV_DESC_ERROR 2
Tue Sep 8 21:58:01 GMT [c1: cf.nm.nicViError:info]: Interconnect nic 0 has error on VI #11 RECV_DESC_ERROR 2
The errors I'm wondering about are the 'Interconnect nic 0 has error on VI <...>' errors - I only see them on this head, and not the other one (c0). Swapping the positions of the controllers makes no difference. Is this likely a hardware issue with the integrated Infiniband controller on c1, or could it be something else?
Thanks!
15 REPLIES 15
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose I should cover what I've already done. I read the post at http://communities.netapp.com/message/9031, and attempted things mentioned in there. Interconnect is integrated in a FAS2050, so nothing I can do about the cable. Only a single cluster interconnect also. Wiped all disks in the system clean and started from scratch with new mailbox disks, etc; did not help. Reseated controllers, did not help. Tried pretty much everything I can think of!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Carlson,
Can you share the /etc/rc configuration file for both the nodes?
Thanks;
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Certainly..
c0 (one without the error):
#Auto-generated by setup Fri Sep 4 23:15:09 GMT 2009
hostname c0
ifconfig e0a `hostname`-e0a mediatype auto flowcontrol full partner 192.168.0.34
ifconfig e0b `hostname`-e0b mediatype auto flowcontrol full partner 192.168.0.36
route add default 192.168.0.30 1
routed on
options dns.enable off
options nis.enable off
savecore
c1 (one with the error):
#Auto-generated by setup Fri Sep 4 23:17:55 GMT 2009
hostname c1
ifconfig e0a `hostname`-e0a mediatype auto flowcontrol full partner 192.168.0.33
ifconfig e0b `hostname`-e0b mediatype auto flowcontrol full partner 192.168.0.35
route add default 192.168.0.30 1
routed on
options dns.enable off
options nis.enable off
savecore
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, here is the output of 'cf monitor' and 'cf status':
c1> cf status
c0 is up, takeover disabled because of reason (unsynchronized log)
c1 has disabled takeover by c0 (unsynchronized log)
VIA Interconnect is down (link up).
c1> cf monitor
current time: 09Sep2009 04:40:28
UP 00:08:12, partner 'c0', cluster monitor enabled
VIA Interconnect is down (link up), takeover capability off-line (unsynchronized log)
takeover by partner off-line (unsynchronized log)
partner update TAKEOVER_DISABLED (09Sep2009 04:40:26)
Then, in 'priv set diag', the output of 'cf monitor all':
cf: Current monitor status (09Sep2009 04:41:54):
partner 'c0', VIA Interconnect is down (link up)
state UP, time 578790, event CHECK_FSM, elem ChkMbValid (13)
mirrorConsistencyRequired TRUE
takeoverByPartner 0x2001 <NVRAM_DOWN,TAKEOVER_ON_PANIC>
mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE
degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE
hw_assist status:
hw_assist Inactive on c1: c1 not monitoring alerts from partner(c0)
hw_assist Inactive on c0: c0 not monitoring alerts from partner c1
timeouts:
fast 1000, slow 0, mailbox 2500, connect 0
operator 600000, firmware 0 (recvd 15000), dumpcore 576790
booting 300000 (recvd 0)
transit timer enabled TRUE, transit 600000 (last 0)
mailbox disks:
Disk 0c.09.4 is a local mailbox disk
Disk 0c.09.5 is a local mailbox disk
Disk 0c.09.0 is a partner mailbox disk
Disk 0c.09.1 is a partner mailbox disk
primary state:
version 2, senderSysid <x>
cluster_time 1252471144, hbt 237, node_status TAKEOVER_DISABLED
info 0x2001 <NVRAM_DOWN,TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_MAILBOX, abs_time 1252471313, sk_time 577790
channel_status 0
channel CHANNEL_IC, abs_time 1252471309, sk_time 573790
channel_status 5
channel CHANNEL_NETWORK, abs_time 0, sk_time 0
channel_status -1
backup state:
version 2, senderSysid <x>
cluster_time 1252471144, hbt 4950, node_status TAKEOVER_DISABLED
info 0x2001 <NVRAM_DOWN,TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_MAILBOX, abs_time 1252471313, sk_time 577260
channel_status 0
Channel Read Ctx:
version 2, senderSysid <x>
cluster_time 1252471144, hbt 4950, node_status TAKEOVER_DISABLED
info 0x2001 <NVRAM_DOWN,TAKEOVER_ON_PANIC>
flags 0x0 <>
channel CHANNEL_IC, abs_time 0, sk_time 0
channel_status 3
Channel Read Ctx:
version 2, senderSysid 0
cluster_time 0, hbt 0, node_status UNKNOWN
info 0x0 <>
flags 0x0 <>
channel CHANNEL_NETWORK, abs_time 0, sk_time 0
channel_status -1
Channel Read Ctx:
version 2, senderSysid 0
cluster_time 0, hbt 0, node_status UNKNOWN
info 0x0 <>
flags 0x0 <>
takeoverState FT_NONE, takeoverString 'No takeover information'
givebackState FT_NONE, givebackString 'No giveback information'
givebackRetries 0, givebackRequested FALSE
autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE
autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE
Maximum primary disk mailbox io times: normal = 245, transition = 0
Maximum backup disk mailbox io times: normal = 307, transition = 0
Num times logs unsynced : 0
Total system uptime: 579079 msec
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Don't know if this will help, I had a problem when I configured our FAS2050 and used a ip address as the name of the virtual interface (used for cisco etherchannel). Once I named it (instead of a ip address), it worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Randy,
It looks like some kind of config issues for me before i say hardware issue. Do u had any time Duplicate ip address in the network?
Thanks;
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I configured the clustering, I specified the partner as the ip address. I tested failover, and it wouldn't work. So I went back and specified the partner address as the interface name topvif for the bottom controller, and botvif for the top controller.
After that, it worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you say "I specified the partner as the ip address" -- do you mean in the IP takeover section, or elsewhere?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Was this supposed to be addressed at me?
There have not been duplicate IP's on the network.
Thanks!
-Nate
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nate,
If you don't find any duplicate address warning in the console then it should be fine. But you should always looks at the configuration side properly.
I will try to do more findings around this issue and post my update if any.
Thanks
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm, interesting. We don't use an etherchannel (virtual interface), just a single IP.. but can you post a config example of the issue you had (before/after)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't easily get the configuration. But the fix was specifiying the partner interface name instead of ip address. I forgot where though (it was over a year ago).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, so I tried doing:
ifconfig e0a `hostname`-e0a mediatype auto flowcontrol full partner e0a
..specifying the name of the partner interface instead of the IP. I made this change on both nodes, still the exact same error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to make etherchanel your rc should look like something like this
hostname toto
ifconfig e0a down
ifconfig e0b down
vif create lacp toto_trunck -b ip e0a e0b
vlan create toto_trunck 26 1
ifconfig toto_trunck-1 172.16.0.5 netmask 255.255.255.0 mtusize 1500 partner 172.16.0.6 -wins
ifconfig toto_trunc-26 192.168.26.5 netmask 255.255.255.0 mtusize 1500 partner 192.168.26.6 -wins
route add default 172.16.0.254 1
routed on
options dns.domainname toto.intranet
options dns.enable on
options nis.enable off
savecore
my node name is toto I have create an etherchanel name toto_trunck, add 2 vlan 1 and 26 add ip on every vlan and the partner ip
here is the cisco config (don't forget to create vlan)
interface GigabitEthernet0/16
description toto e0a
switchport trunk native vlan 9
switchport mode trunk
channel-group 3 mode active
end
cata-giga#sh run int gig 0/22
Building configuration...
Current configuration : 145 bytes
!
interface GigabitEthernet0/22
description toto e0b
switchport trunk native vlan 9
switchport mode trunk
channel-group 3 mode active
end
cata-giga#sh run int po 3
Building configuration...
Current configuration : 111 bytes
!
interface Port-channel3
description lacp toto
switchport trunk native vlan 9
switchport mode trunk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We replaced C1 with a new FAS2050 controller, and it all works perfectly now.