IHAC who is running some performance tests from an AIX system. The sticking point with the customer is that they want to be able to utilize every available FC port on the AIX server and the target ports on the filer. The filer is running in single image mode and we're using AIX native mpio sw. The mpio setting is "round-robin" and according to AIX documentation, this setting makes every port presented to the server have priority 1. What we're seeing on the filer is that eventhough primary and secondary paths are clearly defined (verified with 'sanlun lun show -p'), there's still a considerable amount of traffic going over the cluster interconnect. Is there a way to prevent this?
you can actually have native round-robin on AIX6.1 with and without ALUA. The trick is that without ALUA you need to use dotpaths command, which sets the priorities for the paths, so that secondary paths are not normally used. Next you need to have proper attributes on the hdisk devices so that paths return to enabled state after being in Failed state.
Do not use "sanlun lun show -v -p" to display path priorties, or at least be sure run cfgmgr before if you do (lspath is much better; the code path seems to be different for those 2 utils; lspath always displays the correct state from the AIX ODM state).
Additionally iostat has a syntax (-m/-M or something like that) to display *current* path utilizations every second. Do a test, write some big 10G file and make observations...
By definition lspath does not display priorities, but you can force it to do so. It needs a switch in form of "lspath -F dev,conn, etc, etc" - a more verbose way of displaying stuff. Man lspath,chpath etc are you friends on this one.
AIX6.1 TL2 and TL3 here and it works (with some APARs), even for rootvg (in RR mode), but be sure to get support note stating that you are supported (support matrix/PVR). I'm doing some extensive testing here (including SMSAP, HACMP/PowerHA, PM, VIOS) that includes failing VIOS under stress-test of Oracle/SAP plus failovers of LPARs by using HACMP/PowerHA, hot-backups using SMSAP under Oracle load, etc., restores by using PM/Vaulting, etc. Everything seems to be working so for with round-robin. Note: the installation of AIX LPAR requires having portset to only 1 path during the NIM/AIX install, mainly because the installation likes to put SCSI reservation locks on the LUNs... so after you install it is good to clear the locks by using "lun offline" or "vol offline" and then start the box for the first time or something like this -- be sure to read in detail the host-attachement kit for AIX/FC for more detials.
The problem with traffic on the cluster interconnect might be that:
1) ALUA failed to initialize priorities, not sure in that case "dotpaths -q" would show you something interesting
2) It is normal, because there is heartbeat in MPIO of AIX to probe every path according to hdisk MPIO interval (it needs to send some SCSI inquiry about hdisk every path, so that's why you might be having this traffic, especially if you have many LPARs/AIX servers).
Helped? I need to start collecting points here.... 😜