Cluster Failover on link communcation failure

joostvandrenth · ‎2011-03-14

We were wondering whether a VIF failure would lead to a cluster failover. The first step would be to enable CF failure on network failure and set NFO on the appropriate interface.

Our setup involves a Core switch with servers, clients and storage that connect through intermediary other switches - this means a LINK FAILURE as such might not occur on the switch the NetApp controllers are connected to, while the underlying connection between central and decentral switches will be affected. Is there a way to initiate or detect a failure of this kind? This would be more a failure of communications that a direct link failure.

Same question for a multilevel VIF: multimode connection to 1 stack with single mode on top to another network stack, will it failover when there is not a direct link failure but an underlying one?

Darkstar · ‎2011-03-20

by default the filer will not take over on a multiple-link failure. To enable it, set the option "cf.takeover.on_network_interface_failure" to on.

Note that this only applies to interfaces that have the "-nfo" (negotiated failover) flag set. i.e. you have to change your /etc/rc to include this flag in the ifconfig command line

the other filer will take over when ALL these marked interfaces are down. to change it to take over whenever ANY of the flagged interfaces are down, set the option "cf.takeover.on_network_interface_failure.policy" to "any_nic"

see the manual page for the "options" command for more info, as well as the active/active admin guide.

-Michael

joostvandrenth · ‎2011-03-31

Maybe the question was somewhat vague, if the switch (it being an edge device) to which the NetApp HA pair is connected does not directly suffer a failure, but other connections to the central network core do (thus resulting in loss of communications for servers and clients to the storage) am I able to 'catch' this failure and instigate a failure without manual intervention?

rwelshman · ‎2011-04-01

Ok, so, is it that you are trying to determine how to deal with the problem:

Server A -> Connecting to Filer A

Server A still on the network and Filer A still on the network but Server A can't find a route to Filer A anymore?

I don't think NetApp really plans for that kind of error as it is more of a network redundancy issue.

Would Server A be able to get to Filer B in that case anyway?

shaunjurr · ‎2011-04-03

Hi,

Basically, if you need this kind of redundance, you simply need to have a redundant switch infrastructure. A correctly configured and redundant core net will deal with failovers of such elements. Assuming you are using STP in your network, new paths will be calculated.

The sort of failover you are trying to achieve via some NetApp functionality is fundamentally not a job for a host/server (a NAS unit is just an advanced server appliance) and would be far too complex for a host to resolve. That is why there are network protocols that solve these problems transparantly for all hosts on the network at the same time. ONTap can deal with local link failures (STP can't help here unless the hostis connected to multiple active links) and that is as much knowledge as it really needs to have about the network.

Most of this is also described in the Network Configuration Guide and Best Practices "TR".

WALLBREAKER · ‎2012-07-26

Note that this only applies to interfaces that have the "nfo" (negotiated failover) flag set. i.e. you have to change your /etc/rc to include this flag in the ifconfig command line