ONTAP Discussions

Netapp cluster mode ifgrp lacp problem

Singhz
11,873 Views

Hi

 

I have an issue with an ifgrp configured as LACP on a FAS8020 controller in cluster mode. Each controller has e0c and e0d configured as LACP, however, one of my ifgrp's which is on node1 is showing as disabled, although when i check both e0c and e0d ports individually, they both show status as UP. Both connections link back to separate Juniper switches of which both ports show in the UP state but status as ATTACHED. The ifgrp that is working on controller node 2 is showing port and ifgrp status UP and on the juniper switches as status COLLECTING DISTRIBUTING. 

Everything was working fine up until a few days ago when the all my lifs that where on the ifgrp on controller node 1 that is no longer working failed over one night.

 

We've disabled and enabled the ports from the switches and also from the netapp controller for e0c, e0d and the ifgrp. Our networks guys have checked their config and logs and that all seems to be fine. I've pulled down the LACP log from controller node 1, part extract is below.

 

So let me break this down, both configurations are identical and have been working fine for a while until a few days ago.

 

Controller node 1 -

 

ifgrp a0a : comprises of e0c and e0d in LACP mode. Was working perfectly, but is now in disabled state. Both e0c and e0d are in up state

 

Controller node 2 -

 

ifgrp a0b : comprises of e0c and e0d in LACP mode. a0b status is enabled. Both e0c and e0d are in up state

 

 

 

Extract from LACP log controller node 1

 

---
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: View of partner incorrect:e0c moving select to unselected
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
---
2016-07-29 02:32:22: ERROR: Prev_stored (Actor- Local, Partner- Remote device):
LACP state information: e0c
Actor - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
Partner - SysPri:0 , SysID: 0:0:0:0:0:0, key: 0, PortPri: 0, PortID:0
Actv:0 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
---
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: e0c setting partner port sync to FALSE
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1

 

 

Please can someone point me in the right direction as to where the problem may be and a possible fix. Your input is greatly appreciated.

 

Cheers

1 ACCEPTED SOLUTION

Singhz
10,864 Views

Hi

 

After a lengthy support call with Netapp, Netapp put this down to an issue with the CNA Qlogic Card. During collection of stats for Netapp I was asked to run the following. Apon running the "cna dump e0c" command, the ifgrp began to work again. At the time we were running an older version of ONTAP 8.3 and were advised to upgrade as there were fixes released in the newer versions. We are now running ONTAP 9.1 P5. I hope this helps

 

 

Trigger ASUP

  1. *> node run -node <nodename>
  2. *> priv set diag
  3. *> rastrace dump -m 46
  4. *> mbstat
  5. *> rtag -t mbuf
  6. *> ifstat -a
  7. *> ifinfo -a
  8. *> cna dump -d

 

Wait 1 minute

 

  1. *> mbstat
  2. *> rtag -t mbuf
  3. *> ifstat -a
  4. *> ifinfo -a
  5. *> cna dump 0c <<<< that is the corresponding fiber port #

 

Trigger ASUP

Upload command outputs

Upload RAStraces from

     /mroot/etc/log/rastrace  ->  QLA_*.dmp (Rastrace dump)

Upload CNA firmware dump from

     /mroot/etc/log  ->  qla_<port>_*.dmp (FW dump)

View solution in original post

4 REPLIES 4

hariprak
11,797 Views

Hi,

 

Hope this helps https://kb.netapp.com/support/s/article/faq-troubleshooting-lacp-port-channel-interface-groups

 

Thanks

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Naveenpusuluru
11,784 Views

Hi @Singhz

 

Please check the lacp configuration on network switch side. Please ask you network team to take a look at that configuration.

japinto1
10,892 Views

Did you get this issue resolved? Experiencing the same issue right now? 

Singhz
10,865 Views

Hi

 

After a lengthy support call with Netapp, Netapp put this down to an issue with the CNA Qlogic Card. During collection of stats for Netapp I was asked to run the following. Apon running the "cna dump e0c" command, the ifgrp began to work again. At the time we were running an older version of ONTAP 8.3 and were advised to upgrade as there were fixes released in the newer versions. We are now running ONTAP 9.1 P5. I hope this helps

 

 

Trigger ASUP

  1. *> node run -node <nodename>
  2. *> priv set diag
  3. *> rastrace dump -m 46
  4. *> mbstat
  5. *> rtag -t mbuf
  6. *> ifstat -a
  7. *> ifinfo -a
  8. *> cna dump -d

 

Wait 1 minute

 

  1. *> mbstat
  2. *> rtag -t mbuf
  3. *> ifstat -a
  4. *> ifinfo -a
  5. *> cna dump 0c <<<< that is the corresponding fiber port #

 

Trigger ASUP

Upload command outputs

Upload RAStraces from

     /mroot/etc/log/rastrace  ->  QLA_*.dmp (Rastrace dump)

Upload CNA firmware dump from

     /mroot/etc/log  ->  qla_<port>_*.dmp (FW dump)

Public