ONTAP Discussions

iSCSI connection failure on LACP re-balance


We are experiencing dropping iSCSI connections on our Linux machines connected to a NetApp FAS2552 (Data ONTAP 8.3.2RC2).


This is what we get in our log files.


Linux server (/var/log/syslog /var/log/daemon.log /var/log/messages):

Sep  5 11:35:26 hostname kernel: [3621054.106509]  connection2:0: ping timeout of 10 secs expired, recv timeout 5, last rx 5200138331, last ping 5200135331, now 5200142081
Sep  5 11:35:26 hostname kernel: [3621054.106611]  connection2:0: detected conn error (1011)
Sep  5 11:35:27 hostname iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Sep  5 11:35:32 hostname iscsid: connection2:0 is operational after recovery (1 attempts)

NetApp Eventlog:

09/05/2016 09:35:28  node-1  warning  iswti_iscsip_thread  iscsi.warning: ISCSI: New session request from initiator iqn.1993-08.org.debian:01:5eba8db68f34, a session from this initiator already exists.
09/05/2016 09:35:31  node-1  notice  iswti_iscsip_thread  iscsi.notice: ISCSI: New session from initiator iqn.1993-08.org.debian:01:5eba8db68f34 at IP addr

Our setup:



When a rebalance of outgoing traffic happens on the Linux server, traffic for an existing iSCSI session can switch to the other link in the lacp trunk. The switches prefer to send traffic out directly, so the Netapp will see incoming traffic switch to another active port within the lacp vif. Currently, it seems the Netapp ignores incoming traffic for an iSCSI session when it moves to another port, causing the session to timeout.


How can we fix this?



Upgrading to 8.3.2P7 solved the issue.


Before upgrading we got some help/hints from NetApp Technical Support. Unfortunately they couldn't point to an exact release or bug number where this issue was fixed. One of the things we tried was disabling Fast Path + reconnecting the iSCSI initiators. This didn't solve it. But finally, after upgrading to 8.3.2P7 the issue disappeared.