a month ago
I thought I would mention this behaviour in case it is a surprise to anyone else.
Each NetApp iSCSI LUN is mapped to an initiator group and a portset. An initiator group should consist of the unique iSCSI initiator ID for a single SAN client in most cases. A portset is a list of target NetApp IP addresses that clients should communicate with and this needs to be at least an IP for the source server plus its HA sibling but in a cDOT cluster where the source servers are virtual this ends up being an IP for each physical node that the vservers could reside on. When an iSCSI client tries to discover its LUN(s) against the initiator ID of the source SVM with one of the IPs in that portset, the SVM must respond with the list of all IPs in the portset so the client knows about other potential network paths to its LUN(s). Data traffic should flow primarily through one of those target IPs (in our case the appropriate physical node for the disk aggregate that the LUN resides on and not necessarily the IP used in discovery) and multipathing should report at least one an alternative instance of the LUN being available via a different physical node IP (not necessarily that of the physical node where the SVM’s HA sibling resides).
This is in accordance with IETF standards (see p228-231 of Appendix D in https://www.ietf.org/rfc/rfc3720.txt):
In a discovery session, a target MAY respond to a SendTargets request with its complete list of targets, or with a list of targets that is based on the name of the initiator logged in to the session. A SendTargets response MUST NOT contain target names if there are no targets for the requesting initiator to access. … Multiple-connection sessions can span iSCSI addresses that belong to the same portal group. Multiple-connection sessions cannot span iSCSI addresses that belong to different portal groups. If a SendTargets response reports an iSCSI address for a target, it SHOULD also report all other addresses in its portal group in the same response.
Thus, the source SVM must respond with at least all target IPs in the relevant portset and must not respond with completely irrelevant IPs but is allowed to respond with its full list of target IPs. One of the SVMs in our 6-node cluster has 2 portsets on different subnets defined and our clients only have network access to the IPs in one portset or the other. We have discovered that instead of this SVM responding to an iSCSI sendtargets request with just the 6 IPs from the mapped portset, it will respond with all 12 IPs from both portsets. We have a Redhat7.2 client with 3 LUNs from this SVM that has systemd configured to delay starting an application on bootup until its required LUNs are available and mounted. I believe that seeing the additional 6 target IPs is unnecessarily delaying LUN discovery long enough that systemd times out and the application fails to start. I was seeing the same impact on an identical iSCSI client with 6 LUNs from a different SVM that had only one portset. This 2nd client must have attempted discovery against 2 additional SVMs at some point so it had 3 IQNs under /var/lib/iscsi/nodes and was seeing 18 target IPs. Systemd was also timing out discovering LUNs so it would fail to start the same application on bootup .. until I removed those additional IQNs and reduced it to just the relevant 6 target IPs in its mapped portset. I may be able to resolve the 1st client's problem (seeing IPs from the 2nd unreachable portset) by defining an access list on its SVM or using iptables, but ultimately that's a bandaid for what will be a short-term scenario. I've been told responding with all target IPs is a long-standling behaviour and changing that would have unknown impact though.