network trace route from two nodes to cluster-mgmt-lif failed.


I have 6 nodes in the cluster, two nodes in the HA failed to cluster-mgmt-lif when I do "network traceroute -node prod-node1 -destination cluster_mgmt_ip". I am fine on all the other 4 nodes. 


Can somebody here please shed the light on this issue please?



It could be any number of issues, storage side or switch side.  Below assumes that cluster management IP and node management IPs are on same subnet / VLAN / Broadcast Domain.

1) verify that the LIFs configured and where you expect them to be; IP addresses, Subnets, Ports.

::> network interface show -role node-mgmt|cluster-mgmt


2) verify that the ports the LIFs are currently residing on ports that are in the correct broadcast domains.

::> network port show


3) identify the network switch ports and see if actual the switch ports connected have configurations match server side config.

::> system node run -node * -command cdpd show-neighbors


First do this (turn oncdp/lldp):

system node run -node * options cdpd.enable on
system node run -node * options lldp.enable on

Wait for 3 minutes for LLDP/CDP to talk.

Then do this (assuming all your MGMT and ClusMGMT are on e0M):

network device-discovery show -port e0M

This should show which switches and switch ports you e0M ports are connected to. Maybe you are connected to a wrong switch. Maybe the access VLAN is set incorrectly at the switch port.


Try testing to the gateway

network ping -lif <node-mgmt> -vserver <admin-svm> -destination <gateway_ip>

Repeat for each NODE-MGMT


Check your Broadcast-Domains

broadcast-domain show

You should see one (unless you renamed it, it will be called Default) that includes the e0M ports.

Make sure ***ALL THE PORTS*** in that broadcast domain are correct! When adding in new nodes, *ALL* ports get added to the Default broadcast-domain. You may need to remove ports that should not be there!


Check the LIFs failover ports:

network interface show -role node-mgmt|cluster-mgmt -failover

VERIFY the ports listed are correct. If not, the broadcast-domain is incorrect!


Finally, check failover-policy:

network interface show -role node-mgmt|cluster-mgmt -fields failover-polcy

All the NODE-MGMT LIFs should be local-only. The Cluster-MGMT could be system-defined. I would modify to broadcast-domain-wide:

network interface modify -lif <cluster_mgmt> -vserver <admin_svm> -failover-policy broadcast-domain-wide

Hope that helps. If you still have trouble, please think about posting some of the out from the commands above here.