ONTAP Discussions

Problem with second level interface group

michael_w_grice
11,163 Views

I am having an issue with a second level interface group in 8.0.1. It  consists of an LACP interface group with three links attached to one  switch and a single interface attached to a second switch. If I create  the second level interface group without adding the single link,  naturally it works fine. If I add the single link (either as a physical  link or as a single mode vif with only one interface), it will work  briefly and I end up with error messages like the ones below and then  the link stops working completely.

CATONETAPPA01> Wed May 18 09:53:01 EST [FILER: pvif.failedLinkMonitoring:error]: group0: link-monitoring logic failed
Wed May 18 09:53:01 EST [FILER: pvif.failedLinkMonitoring:error]: single0: link-monitoring logic failed
Wed May 18 09:53:01 EST [FILER: pvif.allLinksDown:CRITICAL]: eth0: all links down

The  single interface works fine either on its own or in an interface group  by itself (i.e., I can add an IP address and it stays up).

To spell out a little better what I am doing, this works fine:

ifgrp create lacp ifgrp1 -b ip e0a e0b e0c

ifconfig ifgrp1 10.1.1.1 netmask 255.255.255.0

Running this nukes it:

ifgrp add ifgrp1 e0d

The same for this:

ifgrp create single ifgrp2 e0d

ifgrp add ifgrp1 ifgrp2

Creating everything at the same time also causes the same issues.

I did see a bug (324514) which looks like it might be applicable, but it's supposed to be fixed in 8.0.1.

Any thoughts here? We've double-checked the switch config and the cabling and don't see any obvious problems. The three switch ports attached to the interfaces for the LACP link are configured for LACP, and the fourth switch port for e0d for the backup is on a different switch.

FILER> ifgrp status
default: transmit 'IP Load balancing', Ifgrp Type 'multi_mode', fail 'log'
group0: 3 links, transmit 'IP Load balancing', Ifgrp Type 'lacp' fail 'default'
         Ifgrp Status   Up      Addr_set
        trunked: eth0
        up:
        e0c: state up, since 18May2011 10:26:56 (01:25:26)
                mediatype: auto-1000t-fd-up
                flags: enabled
                active aggr, aggr port: e0a
                input packets 58834, input bytes 6348064
                input lacp packets 190, output lacp packets 178
                output packets 209, output bytes 23622
                up indications 2, broken indications 0
                drops (if) 0, drops (link) 0
                indication: up at 18May2011 10:26:56
                        consecutive 0, transitions 2
        e0b: state up, since 18May2011 10:26:56 (01:25:26)
                mediatype: auto-1000t-fd-up
                flags: enabled
                active aggr, aggr port: e0a
                input packets 1200, input bytes 95647
                input lacp packets 191, output lacp packets 180
                output packets 44208, output bytes 22018444
                up indications 2, broken indications 0
                drops (if) 0, drops (link) 0
                indication: up at 18May2011 10:26:56
                        consecutive 0, transitions 2
        e0a: state up, since 18May2011 10:26:56 (01:25:26)
                mediatype: auto-1000t-fd-up
                flags: enabled
                active aggr, aggr port: e0a
                input packets 1156, input bytes 90247
                input lacp packets 191, output lacp packets 178
                output packets 1727, output bytes 182896
                up indications 2, broken indications 0
                drops (if) 0, drops (link) 0
                indication: up at 18May2011 10:26:56
                        consecutive 0, transitions 2
eth0: 1 link, transmit 'none', Ifgrp Type 'single_mode' fail 'default'
         Ifgrp Status   Up      Addr_set
        up:
        group0: state up, since 18May2011 10:26:56 (01:25:26)
                mediatype: Enabled interface groups
                flags: enabled
                input packets 61190, input bytes 6533958
                output packets 46144, output bytes 22224962
                output probe packets 0, input probe packets 0
                strike count: 0 of 10
                up indications 1, broken indications 0
                drops (if) 0, drops (link) 0
                indication: up at 18May2011 10:26:56
                        consecutive 5125, transitions 1

1 ACCEPTED SOLUTION

aborzenkov
11,163 Views

This message usually means that two ports that single vif interfaces are connected to are not in the same L2 (broadcast) domain. NetApp tries to verify single VIF connectivity by broadcasting from each interface and checking whether these broadcast packets are received on another interface.

Another reason could be that switch filters out (blacklists) those packets. At least one plausible reason is that these packets are using different MAC from common VIF MAC, so switch may prohibit two different MAC addresses on host-connected ports.

Check inter-switch connectivity; check statistic on switch whether there are some dropped/rejected packets.

View solution in original post

5 REPLIES 5

aborzenkov
11,164 Views

This message usually means that two ports that single vif interfaces are connected to are not in the same L2 (broadcast) domain. NetApp tries to verify single VIF connectivity by broadcasting from each interface and checking whether these broadcast packets are received on another interface.

Another reason could be that switch filters out (blacklists) those packets. At least one plausible reason is that these packets are using different MAC from common VIF MAC, so switch may prohibit two different MAC addresses on host-connected ports.

Check inter-switch connectivity; check statistic on switch whether there are some dropped/rejected packets.

michael_w_grice
11,163 Views

Thanks. After working with the guy who manages the switches in question, we resolved the issue. The cable attached to e0d appears to have been bad.

shaunjurr
11,163 Views

Hi,

Basically, the ability to run port aggregationg/etherchannel is also dependent upon the switch supporting it.  Since you are running an active aggregation of multiple links over more than one switch, the switches have to support etherchannel/port aggregation over multiple switches.  There are a limited number of switches that support this.  The end result is, as was already posted, the switch spanning-tree exchanges notice the same MAC address on more than one switch and to avoid loops, one or more ports will be blocked on the switch.

If your switches don't support link aggregation over multiple switches, you need to set up 2 levels of link aggregation: one "ifgrp" with the original 3 interfaces and then a "failover" ifgrp above that in which you have one passive port for failover and 3 active ports, like this:

ifgrp create lacp ifgrp1 -b ip e0a e0b e0c

ifgrp create single s_ifgrp1 ifgrp1 e0d

ifgrp favor ifgrp1

Otherwise you basically have to fix your choice of switches or switch configuration to support what you are trying to do.

michael_w_grice
11,163 Views

Well, we're not actually running port aggregation over multiple switches. The port aggregation links are all over the same switch, with a backup link to the second switch.

shaunjurr
11,163 Views

The example that I posted will get you a "backup" link.  The examples that you posted will get you blocked interfaces.  You can't add e0d to an existing ifgrp if that configuration isn't supported by the switches, i.e. on two separate switches that can't build etherchannel/link aggregate groups.  You can't just pack e0d into another ifgrp and force them together either.

I'm not sure how you think you solved this, but with the configuration attempts that you posted, you are unlikely to have success.

Public