FAS2720 & AFF220 4 Node cluster

Sikander · ‎2020-01-01

Hi,

We are having an issue where we have following setup:

- we have 4 nodes cluster with different IPs say X.X.X.1 to X.X.X.4

- we have 1 Cluster management LIF x.x.x.5

Now the issue is that when we unplug the management cable from X.X.X.1 the other node do not take over automatically the cluster management LIF IP (cluster management IP not accessible) and we have to manually shift it to other node. Please guide us what we are missing in this situation.

Please find the attached Screenshot.

Thanks.

Ontapforrum · ‎2020-01-01

Hi,

What is the output of this cmd:

::> network interface show -failover -lif cluster_mgmt

Also: Could you eloborate on 'we have to manually shift it to other node' : When you unplug cable, do you lose connection to ssh ? if so, how do you manually shift it to other node ?

Thanks!

Sikander · ‎2020-01-02

output result attached

::> network interface show -failover -lif cluster_mgmt

Sikander · ‎2020-01-02

output result attached

::> network interface show -failover -lif cluster_mgmt

Ontapforrum · ‎2020-01-02

From the screenshot : It looks perfect and standard. cluster_mgmt is available to failover to ports from all nodes in the failover group (node mgmt & data ports). Failover-group & policy is standard as it should be.

In the screenshot, I see that failover targets presented are in this order:
1) If Cluster_1:e0M is down, we can simulate it.

::>set adv
::*> network port modify -node cluster_1 -port e0M -up-admin false

it should failover to e0c, if that's unavailable then to e0d and so on until e0f, and then it will switch to different Node i.e cluster_2:e0M if none of the ports on cluster_1 is available.

You can test it out and let us know.

::*> network interface show -role cluster-mgmt

This should show the current-node/port of the failvoer-target and is-home should say 'false'.

TMACMD · ‎2020-01-02

This is a standard setup issue. The ONLY ports that should be in that list (in other words, based on the output, in the Default Broadcast-Domain) are connected ports on the same physical network.

Different customers have different setups. With that, at a minimum, the broadcast-domain should include e0M from each node. *IF* you have e0c/e0d/e0e/e0f connected and are on the same physical network (whatever networ e0M is on, like 192.168.1.1 - 192.168.1.4) then it will work. If it is not, then it is entirely possible when the port fails (or the plug is pulled) it will go to another port and advertise there (Reverse ARP) that the IP address has moved.

I have seen this event transpire before and the cluster became unavailbel through the cluster_mgmt port.

Please correct your Broadcast-domain(s) and try again.

Typical Broadcast domains seperate things out. For example:

Default (MTU 1500):

node1:e0M

node2:e0M

node3:e0M

node4:e0M

NFS (MTU 900):

node1:a0a-101

node2:a0a-101

node3:a0a-101

node4:a0a-101

CIFS (MTU 1500):

node1:a0a-201

node2:a0a-201

node3:a0a-201

node4:a0a-201

Provide more detais if this does not work.

Suggestions include:

"broadcast-domain show ; ifgrp show"

Also

"net int show -failover"

(but please try to copy/paste if you can instead of a "picture". I know, some places cannot, but if you can, it is easier!