We have DOT 9.1 clusters that have 2-3 HA pairs in each cluster. On top of that, each HA pair of said cluster is actually built into a smaller "availability domain" of compute and network resources.
In this environment, we do not desire LIFs to failover to other HA pairs unless absolutely needed. We want the LIFs to stay local to the HA pair if possible. We also have no "NEED" for rolling upgrades, though it is nice to have, if using brodcast-domain-wide failover policy will generally give us local HA pair LIF failover before failing over outside the HA pair, I'd rather dump system-defined policy with rolling upgrades.
Can anyone clear up for me how these two policies choose the next port to failover to, it would be much appreciated!
Every environment is unique and the failover policies include the widest possible set of options to meet any specific need. LIFs will failover to alternative ports for two reasons - node failure and port failure (either on the storage controller or at the upstream switch). Thus the options have to account for the causes of failover and redundancy needs.
Your state use case is to prefer that ports stay local to the HA pair. That implies a failover policy of "sfo-partner-only". For a node based failover, this would seem to make sense on the surface. If a node in an HA pair fails, the data aggregates will go to the HA partner. So let's say we have LIF A on node A, and node A fails to node B. Having LIF A migrate to a port on node B keeps the access through LIF A local, which is generally preferred. But, that failover strategy comes with a potential price.
Consider that Node A and Node B each have a port channel to upstream switches, with matching VLAN configurations, etc. It would make sense in this regard for LIF A, running on the port channel from node A would fail to the matching port channel on node B. Assume also, in the interests of redundancy, you have a second LIF that normally runs on node B in the same SVM which is used for direct access to data normally on node B. It also provides a small measure of redundancy in that you have two access points to data on the same SVM. But in this failover configuration, both access points are now running on a single port channel. If something would happen to that port channel (or single port if so configured) both access points to the local HA pair would go down together. Without a conscious design to ensure there is additional redundancy, there is a small loophole through which all access to some data could be lost.
You might consider this a convoluted way to describe the coupling of failure domains. Another combination of events might be network switch maintenance that purposely takes down specific links, then having the "other" node fail unexpectedly severing both access points.
That's where system-defined comes into play. System-defined defaults to auto-migrating a LIF to a separate HA pair intentionally in an effort to keep the migrated LIF as independent as possible from any associated LIF on the HA partner node. Granted, the level of separation might not be great, but the system tries to keep it as wide as possible.
Finally, broadcast-domain-wide is the failover mode used to keep communication as highly available as possible for those LIFs that by nature can only be single pointed. For instance, it really doesn't matter where the cluster-management interface lives, but you want as wide availability as possible so that you always have access to the primary cluster interface.
The various failover modes give you the ability to use as much or as little freedom in network failure domain design as you desire. For those with basic needs, default settings generally lean to the side of availability and as much functionality as possible (such as update strategies). In your case where it sounds like you have very specific network design requirements, it is sensible that some of the design choices don't have value in your environment.
Thanks for the great explination and quick response!
I have one follow-up question if you don't mind? I do understand what you have said about the use cases for each of these policies and how they would be used in certain situations. I wonder about the failover policy "brodcast-wide". If the active port were to fail, is there any way to know where that port will failover to in the cluster? does it have a prefered location, like local-first, or just next-available? or is it a completely random port from the failover-group?
For each of the failover policies I imagine there is an algorithmic determination of the "best" one in some fashion. There could be multiple choices of "landing" port in any of the policies if enough ports are configured on a given node/resource.
Sorry, but I don't have any idea how ONTAP selects between multiple potential port failover targets.
"system defined" will fail LIF's over to home port or non-HA ports. However, we have a few vservers, some LIF's use defined "failover groups" for the failover-group, and some other LIF's within the same vserver use "broadcast domain" as the failover-group, both cases use 'system defined' policy.
Further, these LIF's using the "failover group" since it defined ports in the same HA as the failover targets, these LIF's will be failed to ports in the same HA. That would be against what "system defined" is supposed to do.
Should I replace the "failover group" by using the broadcast domain instead? Thus, all LIF's in this vserver will be using the broadcast domain. Will this change cause any issues?
My 2nd question, if I use broadcast domain as the failover-group, then any use cases that I need to use a separted "failover group"?