I've a case open and am working with NetApp Tech Support, but I figured I'd post here in case someone has heard of this:
For some reason, when I enable Dynamic Least Queue depth LBP on my NetApp LUN, it does not load balance across HBAs. I can see from my SAN Switch monitoring software that traffic ONLY goes across one HBA and only to one target port on the controller. If I switch to Round Robin, it immediately load balances across both HBAs and all target ports on the controller.
Windows Server 2008 R2 x64 Enterprise
FC Host Utilities 5.2
FC Host Utilities 5.3 (I tested both versions, same results)
NetApp MPIO 3.3.1
NetApp MPIO 3.4 (Both version exhibit same results)
FAS 6280 HA Pair
All Fiber Channel (No iSCSI, NFS,etc..)
Two Brocade Switches
I'm kind of stumped. One one had, I'd prefer LQD because it's what is recommended to me by pretty much everyone (including NetApp). I'd go to RR, but the issue I see is that in DSM 3.4 RR does not distinguish between optimized and non-optimized pathing. The manual pointedly states this is NOT a recommended setting for FCP environments. I'm thinking I may have to resort to RR with Subset (using the non-optimized paths as passive paths).
Just to clarify, this is expected behavior. Our load balance policies honor ALUA first unless explicitly overriden. giving the user control over path selection, as is the case with round robin with subset. We recommend LQD because it keeps I/O off of the non-optimized paths, which is desireable for load balancing at the controller as well, and allows the DSM to select the best path based on the queue depth. Using all available paths is not necessarily the optimal way to load balance, depending on the configuration. If it is indeed desired, RRwS is the way to go. DSM 4.0 documentation has been updated to provide further clarification.
I've actually revisited this recently. My more recent tests show that LQD does work, but does not load balance the way I'd expect. This began for me quite some time ago, so I don't have all of details still. I think part if my issue was some misconfiguration or unexpected behavior somewhere in either the Brocade switch, MPIO, Snapdrive, or WIndows OS. Part of it as well was a misunderstanding on my part of how LQD works.
I'm in the process of retesting LQD and assuming it works properly I'll start switching servers back over to it. My initial tests show that it fails over as expected when a path disappears and it does appear to use both paths, though not equally (which I think is fine).
If your storage is not being heavily taxed, then all paths should be at 0 queue depth. So it would make no difference which path it uses. Round robin might load balance perfectly, but if you have 1% load on 4 paths vs 4% load on 1 path, the throughput is going to be a wash.
LQD should start spreading across more links when the one becomes busy enough to start queuing. So before assuming it doesn't work, check your Queue depth on the lun that has the bulk of the traffic.
I also tried enabling ALUA, even though it isnt for iSCSI, and it didnt make a difference. Interestingly, I have a smaller lun which does appear to be load balancing ok? Also, I tried removing the ONTAP DSM and setting the MS DSM to LQD and that is load balancing fine as well??
Hi, just wondering if there was any update on this issue or a fix?
We are experiencing a similar issue in that LQD does not balance the load. As far as I can see MPIO is all setup and working ok with individual source IP and target vifs traversing 2 x separate logic vlans. Our platform is based on Hyper-V Server 2008 R2 SP1 OS (similar to server core enterprise), running over Cisco Nexus 5k and 2k switches, linking to Dell Blade compute and Dell M6348 switches and FAS6280 Controllers. The platform is a Hyper-V cluster.
Round Robin does appear to work ok but we are following recommendations as per TR-3702 which says LQD is best practice!