ONTAP Discussions

Fabric MetroCluster sporadically selectes non-HA pathes to shelf

aborzenkov
2,942 Views

I believe I have seen similar problem posted recently.

FMC with 2 x FAS3140, 7.3.4 (factory delivery), 4 HBA used. After fully booting node it complaints that some disks are not multipathed. Indeed, out of 4 available pathes Data ONTAP selectes two going via the same switch (i.e. both to A/B side of shelf). After several unplugging and replugging A and B channels for this stack it suddenly setlles on correct A/B pathes. But is not clear why it selects suboptimal pathes nor how to force path re-selection using less intrusive means.

Is it a known issue? I am going to update to 7.3.5.1P3 anyway (but am a bit uneasy as it still is not explicitly listed in compatibility matrix); it is the first time system is booted after assembling and switch interconnect is finished.

3 REPLIES 3

shaunjurr
2,942 Views

Hi,

I had this before too.  Basically, if you have the switch ports configured correctly (see the TR... 3548, iirc), then basically, it is probably a ESH and/or disk fw issue.  When you upgrade, have somebody close to the system.  Shelf fw upgrades on fabric-attached metroclusters have never worked correctly for me, so you might need someone to re-seat the ESH modules to get them to boot correctly.  Then, of course, you will have aggregates to sync up, etc.  It could be a long, rather involved process.  I think your Brocade Fabric OS needs to be at 6.1.1 or so too for 7.3.5.1, but you can easily check the matrix for that.

The path selection algorithm for disks on metroclusters is unfortunately not as refined as one would hope.  FWIW, I haven't had the problem for a long time, but it was a real PITA with support when it happened.

Good luck.

aborzenkov
2,942 Views

After rechecking cabling I realized that it did not correspond to pictures in tr-3548. Although nowhere is stated that cabling shown is mandatory (and not just an example) I changed it to be precisely as shown in tr (specifically Appendix G, figure 22). I did not observe Mixed-HA warning any more since then.

Of course it may be just a coincidence.

I wish NetApp were more explicit about requirements. E.g. – does it matter that port 0a is connected to ESH B and port 0b to ESH A? Or could it just as well be reversed? Etc …

shaunjurr
2,942 Views

Hi,

The requirements before software disk ownership were even wierder.  The documentation for fabric-attached metroclusters has always been very incomplete.  I made a big stink once and got 3548 cleaned up a lot because the information there conflicted with other setup guides.  The confusion isn't nice when you are stuck in the middle of an installation either.  But, I digress...

Even with normal and stretch metroclusters, you have a sort of "backwards" hookup of your normal 0a and 0c ports to the B modules.  I'm no metrocluster expert but we have a half dozen of them in varying configurations.  The inflexiblity (and frankly immaturity) of the disk fabric is still one of the elements that irritates me the most.... besides the wild experiment with MPO interconnects for stretch clusters...

Anyway, I hope most of the unclear parts of metrocluster setups are clear in 3548, but if not, feel free to ask or open a case, or both...

😉

Public