2014-06-23 12:49 PM
Hello... I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster. Here is what I have done:
1.set, and a net switch and port set. The cluster set does not have a NIC assigned, but the port set does.
2. Installed vSIM CDOT 8.2.1
3. Converted to single vmdk file
4. Converted to template
5. Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)
5. Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.
6. On reboot went to VLOADER and: a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515) b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)
7. Created cluster dctcf01 on node 01
8. Tried to add dctcf01-03 to cluster
During Node Check, received error: "Cluster join membership failed."
"Restarting Cluster Setup"
I am befuddled as to what to do next. I've been playing with this for weeks now. Does anyone have an answer?
2014-06-23 01:14 PM
Sorry, line #1 should have stated one vswitch for the cluster (cdot) and one for vmnet use.
Also, could the problem be the dashes in the system serial number? The copy I have, does not have dashes, so I did not enter any on the additional node.
2014-07-07 07:35 AM
Yes. They were powered on so I could go in to systemshell and modify the disc configuration. Basically deleted all entries in ",disks" and used vsim_mkdisk to create four shelves of 9TiB SAS discs.
Then shut down, changed System Serial and SysId, re-initialized array with new root vol.
2014-07-07 08:05 AM
Interesting things happen during the first boot. Some of it happens in /var, which persists across a 4/4a.
A bootmenu->wipeconfig may clear it out. But you'd be better off starting clean.
Change the serial & sysid first (very first boot), then go into the systemshell and change the disk population.
The easy way though is to catch the first boot, set the mode, set the serial, set the sysid, arm vdevinit with the desired disk population, then let it boot.
2014-07-17 09:52 AM
Didn't work Sean. Followed the procedure, created the cluster on one node just fine.
Then tried to join on second node. It found the cluster, showed me the name and asked me if I wanted to join.
When it tried to join the node, it got the same error, "cluster join membership failed".
Thanks, I appreciate your effort.
2014-07-17 10:09 AM
Troubleshooting cluster join is interesting. Cluster join can fail for a bunch of reasons, most of them problems on the cluster network. All cluster lifs must be able to communicate with all other cluster lifs, MTUs must match and packets can't be fragmented by the switch. In the sim, use MTU 1500 and put all the cluster network ports on the same isolated network.
Check the vswitch/virtual network setup. Cluster ports should be on the cluster network (which you may need to create) or on the host-only net if you run them on workstation. e0a/e0b are default cluster network ports on the sims.
During the failed attempt it should have auto assigned IPs to the cluster lifs. On the node that failed to join, can you ping the cluster lifs on the working node?
Were there any other errors or warnings during the join?