Simulator Discussions

Highlighted

Cannot join node to cluster

Hello...  I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster.  Here is what I have done:

1.set, and a net switch and port set.  The cluster set does not have a NIC assigned, but the port set does.

2.     Installed vSIM CDOT 8.2.1

3.     Converted to single vmdk file

4.     Converted to template

5.     Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)

5.     Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.

6.     On reboot went to VLOADER and:          a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)          b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)

7.     Created cluster dctcf01 on node 01

8.     Tried to add dctcf01-03 to cluster

          During Node Check, received error:  "Cluster join membership failed."

          "Restarting Cluster Setup"

I am befuddled as to what to do next.  I've been playing with this for weeks now.  Does anyone have an answer?

29 REPLIES 29
Highlighted

Re: Cannot join node to cluster

Sorry, line #1 should have stated one vswitch for the cluster (cdot) and one for vmnet use.

Also, could the problem be the dashes in the system serial number?  The copy I have, does not have dashes, so I did not enter any on the additional node.

Highlighted

Re: Cannot join node to cluster

Were they ever powered on before the 2nd step 5? (hoping for a no)

Also step 6 needs to come before the 2nd step 5. 

Highlighted

Re: Cannot join node to cluster

Yes.  They were powered on so I could go in to systemshell and modify the disc configuration.  Basically deleted all entries in ",disks" and used vsim_mkdisk to create four shelves of 9TiB SAS discs.

Then shut down, changed System Serial and SysId, re-initialized array with new root vol.

Highlighted

Re: Cannot join node to cluster

Interesting things happen during the first boot.  Some of it happens in /var, which persists across a 4/4a.

A bootmenu->wipeconfig may clear it out.  But you'd be better off starting clean.

Change the serial & sysid first (very first boot), then go into the systemshell and change the disk population.

The easy way though is to catch the first boot, set the mode, set the serial, set the sysid, arm vdevinit with the desired disk population, then let it boot.

Highlighted

Re: Cannot join node to cluster

Sean, thanks for responding.

But what do you mean by:

1. Set the mode

2. arm vdevinit

Highlighted

Re: Cannot join node to cluster

This sort of thing seems to come up repeatedly, so I wrote a doc:

https://communities.netapp.com/docs/DOC-33502

Highlighted

Re: Cannot join node to cluster

Thanks Sean.  I've never seen this process anywhere, I will give it a try. 

Highlighted

Re: Cannot join node to cluster

Didn't work Sean.  Followed the procedure, created the cluster on one node just fine.

Then tried to join on second node.  It found the cluster, showed me the name and asked me if I wanted to join.

When it tried to join the node, it got the same error, "cluster join membership failed".

Thanks, I appreciate your effort.

Tas

Highlighted

Re: Cannot join node to cluster

Troubleshooting cluster join is interesting.  Cluster join can fail for a bunch of reasons, most of them problems on the cluster network.  All cluster lifs must be able to communicate with all other cluster lifs, MTUs must match and packets can't be fragmented by the switch.  In the sim, use MTU 1500 and put all the cluster network ports on the same isolated network.

Check the vswitch/virtual network setup.  Cluster ports should be on the cluster network (which you may need to create) or on the host-only net if you run them on workstation.  e0a/e0b are default cluster network ports on the sims.  

During the failed attempt it should have auto assigned IPs to the cluster lifs.  On the node that failed to join, can you ping the cluster lifs on the working node?

Were there any other errors or warnings during the join?

Highlighted

Re: Cannot join node to cluster

Sean thanks for taking all of this time dude.

MTU's are at default of 1500, the vSwitch is on the ESXi host and there are no other devices except the cluster NICs of the two cluster nodes, as you said, e0a and e0b.

Yes, during the failed attempt, the second node assigned IP's.  I can ping both e0a and e0b from either of the two nodes.

No other errors, and I didn't see anything in the events.

Anyway, hopefully one of my CSE's maybe able to help soon.  We've been tasked with deploying CDOT, so we need to learn and design the new architecture.

Thanks

Tas

Highlighted

Re: Cannot join node to cluster

Can you post the following output from each node?

::> net port show -instance

::> net int show -role cluster -instance

Highlighted

Re: Cannot join node to cluster

https://discovery.box.com/s/9pe6ldy8pwuqzlbjhd4y

Thank you Parisi, I've saved the output to the above BOX link..  Node 01 has a management address because it formed the cluster, but 02 doesn't so I had to cut and paste screen shots from VMware console.

So you will see a .txt file from Node 01, but an .rtf with screen shots from node 02.

Tas

Highlighted

Re: Cannot join node to cluster

It's asking me to sign in, and I don't have a Box account.

Can you attach to this forum?

Highlighted

Re: Cannot join node to cluster

It appears my company no longer allows open access to BOX folders.  I've tried to attach it here, but the forum editor only allows me to attach links, the A with the chain link next to it.

Do you know of another way?

Highlighted

Re: Cannot join node to cluster

Do you have an MS Live/Outlook.com account?  I can share it from OneDrive.

Check out the KB!
NetApp Insights To Action
All Community Forums