I have a cluster simulator running on which I am trying to test the Negotiated Failover (NFO) functionality for interface. As a background, NFO can be enabled on a physical interface and configured so that if an NFO-enabled interface fails on the partner, a CF event occurs (at least in theory)
Here is the node1 configuration of the cluster:
node1> options cf
cf.giveback.auto.cifs.terminate.minutes 5
cf.giveback.auto.enable off
cf.giveback.auto.terminate.bigjobs on
cf.giveback.check.partner off
cf.takeover.change_fsid on
cf.takeover.detection.seconds 10
cf.takeover.on_disk_shelf_miscompare off
cf.takeover.on_failure on
cf.takeover.on_network_interface_failure on
cf.takeover.on_network_interface_failure.policy any_nic (same value in local+partner recommended)
cf.takeover.on_panic on
cf.takeover.on_short_uptime on
node1> ifconfig -a
ns0: flags=848043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.97.133 netmask 0xffffff00 broadcast 192.168.97.255
partner inet 192.168.97.135 (not in use)
ether 00:50:56:1b:03:f8 (Linux AF_PACKET socket)
nfo enabled
ns1: flags=808042<BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:50:56:1c:03:f8 (Linux AF_PACKET socket)
lo: flags=1948049<UP,LOOPBACK,RUNNING,MULTICAST,TCPCKSUM> mtu 4064
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
ether 00:00:00:00:00:00 (Shared memory)
node1> cf status
Cluster enabled, node2 is up.
Negotiated failover enabled (network_interface).
node1>
And here's the node2 configuration of the cluster
node2> cf status
Cluster enabled, node1 is up.
Negotiated failover enabled (network_interface).
node2> options cf
cf.giveback.auto.cifs.terminate.minutes 5
cf.giveback.auto.enable off
cf.giveback.auto.terminate.bigjobs on
cf.giveback.check.partner off
cf.takeover.change_fsid on
cf.takeover.detection.seconds 10
cf.takeover.on_disk_shelf_miscompare off
cf.takeover.on_failure on
cf.takeover.on_network_interface_failure on
cf.takeover.on_network_interface_failure.policy any_nic (same value in local+partner recommended)
cf.takeover.on_panic on
cf.takeover.on_short_uptime on
node2> ifconfig -a
ns0: flags=808042<BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.97.135 netmask 0xffffff00 broadcast 192.168.97.255
partner inet 192.168.97.133 (not in use)
ether 00:50:56:0f:25:e3 (Linux AF_PACKET socket)
nfo enabled
ns1: flags=808042<BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:50:56:10:25:e3 (Linux AF_PACKET socket)
lo: flags=1948049<UP,LOOPBACK,RUNNING,MULTICAST,TCPCKSUM> mtu 4064
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
ether 00:00:00:00:00:00 (Shared memory)
node2> cf status
Cluster enabled, node1 is up.
Negotiated failover enabled (network_interface).
node2>
However, when I down the ns0 interface on a node, nothing really happens..
node2> date; ifconfig ns0 down
Sun Aug 10 14:43:53 GMT 2008
node2> ifconfig -a
ns0: flags=808042<BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.97.135 netmask 0xffffff00 broadcast 192.168.97.255
partner inet 192.168.97.133 (not in use)
ether 00:50:56:0f:25:e3 (Linux AF_PACKET socket)
nfo enabled
ns1: flags=808042<BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:50:56:10:25:e3 (Linux AF_PACKET socket)
lo: flags=1948049<UP,LOOPBACK,RUNNING,MULTICAST,TCPCKSUM> mtu 4064
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
ether 00:00:00:00:00:00 (Shared memory)
node2> ping 192.168.97.133
ping: wrote 192.168.97.133 64 chars, error=Network is down
ping: wrote 192.168.97.133 64 chars, error=Network is down
ping: wrote 192.168.97.133 64 chars, error=Network is down
ping: wrote 192.168.97.133 64 chars, error=Network is down
ping: wrote 192.168.97.133 64 chars, error=Network is down
coffee break..........
node2> date
Sun Aug 10 14:51:29 GMT 2008
node2> cf status
Cluster enabled, node1 is up.
Negotiated failover enabled (network_interface).
node2>
What am I missing ?