Simulator Discussions
Simulator Discussions
Hello... I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster. Here is what I have done:
1.set, and a net switch and port set. The cluster set does not have a NIC assigned, but the port set does.
2. Installed vSIM CDOT 8.2.1
3. Converted to single vmdk file
4. Converted to template
5. Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)
5. Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.
6. On reboot went to VLOADER and: a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515) b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)
7. Created cluster dctcf01 on node 01
8. Tried to add dctcf01-03 to cluster
During Node Check, received error: "Cluster join membership failed."
"Restarting Cluster Setup"
I am befuddled as to what to do next. I've been playing with this for weeks now. Does anyone have an answer?
Sorry, line #1 should have stated one vswitch for the cluster (cdot) and one for vmnet use.
Also, could the problem be the dashes in the system serial number? The copy I have, does not have dashes, so I did not enter any on the additional node.
Were they ever powered on before the 2nd step 5? (hoping for a no)
Also step 6 needs to come before the 2nd step 5.
Yes. They were powered on so I could go in to systemshell and modify the disc configuration. Basically deleted all entries in ",disks" and used vsim_mkdisk to create four shelves of 9TiB SAS discs.
Then shut down, changed System Serial and SysId, re-initialized array with new root vol.
Interesting things happen during the first boot. Some of it happens in /var, which persists across a 4/4a.
A bootmenu->wipeconfig may clear it out. But you'd be better off starting clean.
Change the serial & sysid first (very first boot), then go into the systemshell and change the disk population.
The easy way though is to catch the first boot, set the mode, set the serial, set the sysid, arm vdevinit with the desired disk population, then let it boot.
Sean, thanks for responding.
But what do you mean by:
1. Set the mode
2. arm vdevinit
This sort of thing seems to come up repeatedly, so I wrote a doc:
Thanks Sean. I've never seen this process anywhere, I will give it a try.
Didn't work Sean. Followed the procedure, created the cluster on one node just fine.
Then tried to join on second node. It found the cluster, showed me the name and asked me if I wanted to join.
When it tried to join the node, it got the same error, "cluster join membership failed".
Thanks, I appreciate your effort.
Tas
Troubleshooting cluster join is interesting. Cluster join can fail for a bunch of reasons, most of them problems on the cluster network. All cluster lifs must be able to communicate with all other cluster lifs, MTUs must match and packets can't be fragmented by the switch. In the sim, use MTU 1500 and put all the cluster network ports on the same isolated network.
Check the vswitch/virtual network setup. Cluster ports should be on the cluster network (which you may need to create) or on the host-only net if you run them on workstation. e0a/e0b are default cluster network ports on the sims.
During the failed attempt it should have auto assigned IPs to the cluster lifs. On the node that failed to join, can you ping the cluster lifs on the working node?
Were there any other errors or warnings during the join?
Sean thanks for taking all of this time dude.
MTU's are at default of 1500, the vSwitch is on the ESXi host and there are no other devices except the cluster NICs of the two cluster nodes, as you said, e0a and e0b.
Yes, during the failed attempt, the second node assigned IP's. I can ping both e0a and e0b from either of the two nodes.
No other errors, and I didn't see anything in the events.
Anyway, hopefully one of my CSE's maybe able to help soon. We've been tasked with deploying CDOT, so we need to learn and design the new architecture.
Thanks
Tas
Can you post the following output from each node?
::> net port show -instance
::> net int show -role cluster -instance
https://discovery.box.com/s/9pe6ldy8pwuqzlbjhd4y
Thank you Parisi, I've saved the output to the above BOX link.. Node 01 has a management address because it formed the cluster, but 02 doesn't so I had to cut and paste screen shots from VMware console.
So you will see a .txt file from Node 01, but an .rtf with screen shots from node 02.
Tas
It's asking me to sign in, and I don't have a Box account.
Can you attach to this forum?
It appears my company no longer allows open access to BOX folders. I've tried to attach it here, but the forum editor only allows me to attach links, the A with the chain link next to it.
Do you know of another way?
Do you have an MS Live/Outlook.com account? I can share it from OneDrive.
I don't have one, no.
What about dropbox?
Can do.
Ok, I'd try the following...
1) On dctcf02-01, set the cluster ports to flowcontrol none
2) On both nodes, set the MTU on the cluster ports to 9000
3) Have all your cluster LIFs on e0a for now
You mentioned being able to ping the LIFs... when you try a cluster join, does the cluster discover the partner's cluster LIFs?
Perhaps try manually inputting the cluster IP address into the join operation.
This is what my cluster ports and LIFs look like on my vsim:
parisi-cdot::*> net port show -role cluster -instance
(network port show)
Node: parisi-cdot-01
Port: e0a
Role: cluster
Link: up
MTU: 9000
Auto-Negotiation Administrative: true
Auto-Negotiation Operational: true
Duplex Mode Administrative: full
Duplex Mode Operational: full
Speed Administrative: auto
Speed Operational: 1000
Flow Control Administrative: full
Flow Control Operational: none
MAC Address: 00:50:56:a6:d2:25
Up Administrative: true
Autorevert Delay: 10
Port Type: physical
Interface Group Parent Node: -
Interface Group Parent Port: -
Distribution Function: -
Create Policy: -
Parent VLAN Node: -
Parent VLAN Port: -
VLAN Tag: -
Remote Device ID: msudheen-vsim2
Node: parisi-cdot-02
Port: e0a
Role: cluster
Link: up
MTU: 9000
Auto-Negotiation Administrative: true
Auto-Negotiation Operational: true
Duplex Mode Administrative: full
Duplex Mode Operational: full
Speed Administrative: auto
Speed Operational: 1000
Flow Control Administrative: full
Flow Control Operational: none
MAC Address: 00:50:56:a6:d2:28
Up Administrative: true
Autorevert Delay: 10
Port Type: physical
Interface Group Parent Node: -
Interface Group Parent Port: -
Distribution Function: -
Create Policy: -
Parent VLAN Node: -
Parent VLAN Port: -
VLAN Tag: -
Remote Device ID: msudheen-vsim2
2 entries were displayed.
parisi-cdot::*> net int show -role cluster -instance
(network interface show)
Vserver Name: parisi-cdot-01
Logical Interface Name: clus1
Role: cluster
Data Protocol: none
Home Node: parisi-cdot-01
Home Port: e0a
Current Node: parisi-cdot-01
Current Port: e0a
Operational Status: up
Extended Status: -
Numeric ID: 1024
Is Home: true
Network Address: 172.31.3.68
Netmask: 255.255.192.0
Bits in the Netmask: 18
IPv4 Link Local: -
Routing Group Name: c172.31.0.0/18
Administrative Status: up
Failover Policy: nextavail
Firewall Policy: cluster
Auto Revert: true
Sticky Flag: false
Fully Qualified DNS Zone Name: none
DNS Query Listen Enable: false
Load Balancing Migrate Allowed: false
Load Balanced Weight: load
Failover Group Name: system-defined
FCP WWPN: -
Address family: ipv4
Comment: -
Vserver Name: parisi-cdot-01
Logical Interface Name: clus2
Role: cluster
Data Protocol: none
Home Node: parisi-cdot-01
Home Port: e0a
Current Node: parisi-cdot-01
Current Port: e0a
Operational Status: up
Extended Status: -
Numeric ID: 1027
Is Home: true
Network Address: 172.31.3.69
Netmask: 255.255.192.0
Bits in the Netmask: 18
IPv4 Link Local: -
Routing Group Name: c172.31.0.0/18
Administrative Status: up
Failover Policy: nextavail
Firewall Policy: cluster
Auto Revert: true
Sticky Flag: false
Fully Qualified DNS Zone Name: none
DNS Query Listen Enable: false
Load Balancing Migrate Allowed: false
Load Balanced Weight: load
Failover Group Name: system-defined
FCP WWPN: -
Address family: ipv4
Comment: -
Vserver Name: parisi-cdot-02
Logical Interface Name: clus1
Role: cluster
Data Protocol: none
Home Node: parisi-cdot-02
Home Port: e0a
Current Node: parisi-cdot-02
Current Port: e0a
Operational Status: up
Extended Status: -
Numeric ID: 1015
Is Home: true
Network Address: 172.31.57.237
Netmask: 255.255.192.0
Bits in the Netmask: 18
IPv4 Link Local: -
Routing Group Name: c172.31.0.0/18
Administrative Status: up
Failover Policy: nextavail
Firewall Policy: cluster
Auto Revert: true
Sticky Flag: false
Fully Qualified DNS Zone Name: none
DNS Query Listen Enable: false
Load Balancing Migrate Allowed: false
Load Balanced Weight: load
Failover Group Name: system-defined
FCP WWPN: -
Address family: ipv4
Comment: -
3 entries were displayed.
Depending on the host config a jumbo MTU may not work. But you should see MTU related alerts in that case. vSims default to a 1500MTU for that reason.
If you're in the brewery the hosts are probably configured to handle the jumbo frames.