ONTAP Discussions
ONTAP Discussions
Hi
I have an issue with an ifgrp configured as LACP on a FAS8020 controller in cluster mode. Each controller has e0c and e0d configured as LACP, however, one of my ifgrp's which is on node1 is showing as disabled, although when i check both e0c and e0d ports individually, they both show status as UP. Both connections link back to separate Juniper switches of which both ports show in the UP state but status as ATTACHED. The ifgrp that is working on controller node 2 is showing port and ifgrp status UP and on the juniper switches as status COLLECTING DISTRIBUTING.
Everything was working fine up until a few days ago when the all my lifs that where on the ifgrp on controller node 1 that is no longer working failed over one night.
We've disabled and enabled the ports from the switches and also from the netapp controller for e0c, e0d and the ifgrp. Our networks guys have checked their config and logs and that all seems to be fine. I've pulled down the LACP log from controller node 1, part extract is below.
So let me break this down, both configurations are identical and have been working fine for a while until a few days ago.
Controller node 1 -
ifgrp a0a : comprises of e0c and e0d in LACP mode. Was working perfectly, but is now in disabled state. Both e0c and e0d are in up state
Controller node 2 -
ifgrp a0b : comprises of e0c and e0d in LACP mode. a0b status is enabled. Both e0c and e0d are in up state
Extract from LACP log controller node 1
---
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: View of partner incorrect:e0c moving select to unselected
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
---
2016-07-29 02:32:22: ERROR: Prev_stored (Actor- Local, Partner- Remote device):
LACP state information: e0c
Actor - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
Partner - SysPri:0 , SysID: 0:0:0:0:0:0, key: 0, PortPri: 0, PortID:0
Actv:0 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
---
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: e0c setting partner port sync to FALSE
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
Please can someone point me in the right direction as to where the problem may be and a possible fix. Your input is greatly appreciated.
Cheers
Solved! See The Solution
Hi
After a lengthy support call with Netapp, Netapp put this down to an issue with the CNA Qlogic Card. During collection of stats for Netapp I was asked to run the following. Apon running the "cna dump e0c" command, the ifgrp began to work again. At the time we were running an older version of ONTAP 8.3 and were advised to upgrade as there were fixes released in the newer versions. We are now running ONTAP 9.1 P5. I hope this helps
Trigger ASUP
Wait 1 minute
Trigger ASUP
Upload command outputs
Upload RAStraces from
/mroot/etc/log/rastrace -> QLA_*.dmp (Rastrace dump)
Upload CNA firmware dump from
/mroot/etc/log -> qla_<port>_*.dmp (FW dump)
Hi,
Hope this helps https://kb.netapp.com/support/s/article/faq-troubleshooting-lacp-port-channel-interface-groups
Thanks
Hi @Singhz
Please check the lacp configuration on network switch side. Please ask you network team to take a look at that configuration.
Did you get this issue resolved? Experiencing the same issue right now?
Hi
After a lengthy support call with Netapp, Netapp put this down to an issue with the CNA Qlogic Card. During collection of stats for Netapp I was asked to run the following. Apon running the "cna dump e0c" command, the ifgrp began to work again. At the time we were running an older version of ONTAP 8.3 and were advised to upgrade as there were fixes released in the newer versions. We are now running ONTAP 9.1 P5. I hope this helps
Trigger ASUP
Wait 1 minute
Trigger ASUP
Upload command outputs
Upload RAStraces from
/mroot/etc/log/rastrace -> QLA_*.dmp (Rastrace dump)
Upload CNA firmware dump from
/mroot/etc/log -> qla_<port>_*.dmp (FW dump)