Community

Subscribe
Highlighted

Cannot join node to cluster

Hello...  I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster.  Here is what I have done:

1.set, and a net switch and port set.  The cluster set does not have a NIC assigned, but the port set does.

2.     Installed vSIM CDOT 8.2.1

3.     Converted to single vmdk file

4.     Converted to template

5.     Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)

5.     Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.

6.     On reboot went to VLOADER and:          a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)          b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)

7.     Created cluster dctcf01 on node 01

8.     Tried to add dctcf01-03 to cluster

          During Node Check, received error:  "Cluster join membership failed."

          "Restarting Cluster Setup"

I am befuddled as to what to do next.  I've been playing with this for weeks now.  Does anyone have an answer?

Re: Cannot join node to cluster

Sorry, line #1 should have stated one vswitch for the cluster (cdot) and one for vmnet use.

Also, could the problem be the dashes in the system serial number?  The copy I have, does not have dashes, so I did not enter any on the additional node.

Re: Cannot join node to cluster

Were they ever powered on before the 2nd step 5? (hoping for a no)

Also step 6 needs to come before the 2nd step 5. 

Re: Cannot join node to cluster

Yes.  They were powered on so I could go in to systemshell and modify the disc configuration.  Basically deleted all entries in ",disks" and used vsim_mkdisk to create four shelves of 9TiB SAS discs.

Then shut down, changed System Serial and SysId, re-initialized array with new root vol.

Re: Cannot join node to cluster

Interesting things happen during the first boot.  Some of it happens in /var, which persists across a 4/4a.

A bootmenu->wipeconfig may clear it out.  But you'd be better off starting clean.

Change the serial & sysid first (very first boot), then go into the systemshell and change the disk population.

The easy way though is to catch the first boot, set the mode, set the serial, set the sysid, arm vdevinit with the desired disk population, then let it boot.

Re: Cannot join node to cluster

Sean, thanks for responding.

But what do you mean by:

1. Set the mode

2. arm vdevinit

Re: Cannot join node to cluster

This sort of thing seems to come up repeatedly, so I wrote a doc:

https://communities.netapp.com/docs/DOC-33502

Re: Cannot join node to cluster

Thanks Sean.  I've never seen this process anywhere, I will give it a try. 

Re: Cannot join node to cluster

Didn't work Sean.  Followed the procedure, created the cluster on one node just fine.

Then tried to join on second node.  It found the cluster, showed me the name and asked me if I wanted to join.

When it tried to join the node, it got the same error, "cluster join membership failed".

Thanks, I appreciate your effort.

Tas

Re: Cannot join node to cluster

Troubleshooting cluster join is interesting.  Cluster join can fail for a bunch of reasons, most of them problems on the cluster network.  All cluster lifs must be able to communicate with all other cluster lifs, MTUs must match and packets can't be fragmented by the switch.  In the sim, use MTU 1500 and put all the cluster network ports on the same isolated network.

Check the vswitch/virtual network setup.  Cluster ports should be on the cluster network (which you may need to create) or on the host-only net if you run them on workstation.  e0a/e0b are default cluster network ports on the sims.  

During the failed attempt it should have auto assigned IPs to the cluster lifs.  On the node that failed to join, can you ping the cluster lifs on the working node?

Were there any other errors or warnings during the join?