Simulator Discussions

Cannot join node to cluster

papadopoulosa
15,099 Views

Hello...  I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster.  Here is what I have done:

1.set, and a net switch and port set.  The cluster set does not have a NIC assigned, but the port set does.

2.     Installed vSIM CDOT 8.2.1

3.     Converted to single vmdk file

4.     Converted to template

5.     Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)

5.     Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.

6.     On reboot went to VLOADER and:          a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)          b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)

7.     Created cluster dctcf01 on node 01

8.     Tried to add dctcf01-03 to cluster

          During Node Check, received error:  "Cluster join membership failed."

          "Restarting Cluster Setup"

I am befuddled as to what to do next.  I've been playing with this for weeks now.  Does anyone have an answer?

29 REPLIES 29

papadopoulosa
13,668 Views

Sorry, line #1 should have stated one vswitch for the cluster (cdot) and one for vmnet use.

Also, could the problem be the dashes in the system serial number?  The copy I have, does not have dashes, so I did not enter any on the additional node.

shatfield
13,668 Views

Were they ever powered on before the 2nd step 5? (hoping for a no)

Also step 6 needs to come before the 2nd step 5. 

papadopoulosa
13,668 Views

Yes.  They were powered on so I could go in to systemshell and modify the disc configuration.  Basically deleted all entries in ",disks" and used vsim_mkdisk to create four shelves of 9TiB SAS discs.

Then shut down, changed System Serial and SysId, re-initialized array with new root vol.

shatfield
13,668 Views

Interesting things happen during the first boot.  Some of it happens in /var, which persists across a 4/4a.

A bootmenu->wipeconfig may clear it out.  But you'd be better off starting clean.

Change the serial & sysid first (very first boot), then go into the systemshell and change the disk population.

The easy way though is to catch the first boot, set the mode, set the serial, set the sysid, arm vdevinit with the desired disk population, then let it boot.

papadopoulosa
13,668 Views

Sean, thanks for responding.

But what do you mean by:

1. Set the mode

2. arm vdevinit

shatfield
13,668 Views

This sort of thing seems to come up repeatedly, so I wrote a doc:

https://communities.netapp.com/docs/DOC-33502

papadopoulosa
13,405 Views

Thanks Sean.  I've never seen this process anywhere, I will give it a try. 

papadopoulosa
13,668 Views

Didn't work Sean.  Followed the procedure, created the cluster on one node just fine.

Then tried to join on second node.  It found the cluster, showed me the name and asked me if I wanted to join.

When it tried to join the node, it got the same error, "cluster join membership failed".

Thanks, I appreciate your effort.

Tas

shatfield
13,669 Views

Troubleshooting cluster join is interesting.  Cluster join can fail for a bunch of reasons, most of them problems on the cluster network.  All cluster lifs must be able to communicate with all other cluster lifs, MTUs must match and packets can't be fragmented by the switch.  In the sim, use MTU 1500 and put all the cluster network ports on the same isolated network.

Check the vswitch/virtual network setup.  Cluster ports should be on the cluster network (which you may need to create) or on the host-only net if you run them on workstation.  e0a/e0b are default cluster network ports on the sims.  

During the failed attempt it should have auto assigned IPs to the cluster lifs.  On the node that failed to join, can you ping the cluster lifs on the working node?

Were there any other errors or warnings during the join?

papadopoulosa
10,214 Views

Sean thanks for taking all of this time dude.

MTU's are at default of 1500, the vSwitch is on the ESXi host and there are no other devices except the cluster NICs of the two cluster nodes, as you said, e0a and e0b.

Yes, during the failed attempt, the second node assigned IP's.  I can ping both e0a and e0b from either of the two nodes.

No other errors, and I didn't see anything in the events.

Anyway, hopefully one of my CSE's maybe able to help soon.  We've been tasked with deploying CDOT, so we need to learn and design the new architecture.

Thanks

Tas

parisi
10,215 Views

Can you post the following output from each node?

::> net port show -instance

::> net int show -role cluster -instance

papadopoulosa
10,216 Views

https://discovery.box.com/s/9pe6ldy8pwuqzlbjhd4y

Thank you Parisi, I've saved the output to the above BOX link..  Node 01 has a management address because it formed the cluster, but 02 doesn't so I had to cut and paste screen shots from VMware console.

So you will see a .txt file from Node 01, but an .rtf with screen shots from node 02.

Tas

parisi
10,216 Views

It's asking me to sign in, and I don't have a Box account.

Can you attach to this forum?

papadopoulosa
10,216 Views

It appears my company no longer allows open access to BOX folders.  I've tried to attach it here, but the forum editor only allows me to attach links, the A with the chain link next to it.

Do you know of another way?

papadopoulosa
10,216 Views

Do you have an MS Live/Outlook.com account?  I can share it from OneDrive.

parisi
9,477 Views

I don't have one, no.

What about dropbox?

papadopoulosa
9,213 Views

Can do.

parisi
9,213 Views

Ok, I'd try the following...

1) On dctcf02-01, set the cluster ports to flowcontrol none

2) On both nodes, set the MTU on the cluster ports to 9000

3) Have all your cluster LIFs on e0a for now

You mentioned being able to ping the LIFs... when you try a cluster join, does the cluster discover the partner's cluster LIFs?

Perhaps try manually inputting the cluster IP address into the join operation.

This is what my cluster ports and LIFs look like on my vsim:

parisi-cdot::*> net port show -role cluster -instance

  (network port show)

                           Node: parisi-cdot-01

                           Port: e0a

                           Role: cluster

                           Link: up

                            MTU: 9000

Auto-Negotiation Administrative: true

   Auto-Negotiation Operational: true

     Duplex Mode Administrative: full

        Duplex Mode Operational: full

           Speed Administrative: auto

              Speed Operational: 1000

    Flow Control Administrative: full

       Flow Control Operational: none

                    MAC Address: 00:50:56:a6:d2:25

              Up Administrative: true

               Autorevert Delay: 10

                      Port Type: physical

    Interface Group Parent Node: -

    Interface Group Parent Port: -

          Distribution Function: -

                  Create Policy: -

               Parent VLAN Node: -

               Parent VLAN Port: -

                       VLAN Tag: -

               Remote Device ID: msudheen-vsim2

                           Node: parisi-cdot-02

                           Port: e0a

                           Role: cluster

                           Link: up

                            MTU: 9000

Auto-Negotiation Administrative: true

   Auto-Negotiation Operational: true

     Duplex Mode Administrative: full

        Duplex Mode Operational: full

           Speed Administrative: auto

              Speed Operational: 1000

    Flow Control Administrative: full

       Flow Control Operational: none

                    MAC Address: 00:50:56:a6:d2:28

              Up Administrative: true

               Autorevert Delay: 10

                      Port Type: physical

    Interface Group Parent Node: -

    Interface Group Parent Port: -

          Distribution Function: -

                  Create Policy: -

               Parent VLAN Node: -

               Parent VLAN Port: -

                       VLAN Tag: -

               Remote Device ID: msudheen-vsim2

2 entries were displayed.

parisi-cdot::*> net int show -role cluster -instance

  (network interface show)

                     Vserver Name: parisi-cdot-01

           Logical Interface Name: clus1

                             Role: cluster

                    Data Protocol: none

                        Home Node: parisi-cdot-01

                        Home Port: e0a

                     Current Node: parisi-cdot-01

                     Current Port: e0a

               Operational Status: up

                  Extended Status: -

                       Numeric ID: 1024

                          Is Home: true

                  Network Address: 172.31.3.68

                          Netmask: 255.255.192.0

              Bits in the Netmask: 18

                  IPv4 Link Local: -

               Routing Group Name: c172.31.0.0/18

            Administrative Status: up

                  Failover Policy: nextavail

                  Firewall Policy: cluster

                      Auto Revert: true

                      Sticky Flag: false

    Fully Qualified DNS Zone Name: none

          DNS Query Listen Enable: false

   Load Balancing Migrate Allowed: false

             Load Balanced Weight: load

              Failover Group Name: system-defined

                         FCP WWPN: -

                   Address family: ipv4

                          Comment: -

                     Vserver Name: parisi-cdot-01

           Logical Interface Name: clus2

                             Role: cluster

                    Data Protocol: none

                        Home Node: parisi-cdot-01

                        Home Port: e0a

                     Current Node: parisi-cdot-01

                     Current Port: e0a

               Operational Status: up

                  Extended Status: -

                       Numeric ID: 1027

                          Is Home: true

                  Network Address: 172.31.3.69

                          Netmask: 255.255.192.0

              Bits in the Netmask: 18

                  IPv4 Link Local: -

               Routing Group Name: c172.31.0.0/18

            Administrative Status: up

                  Failover Policy: nextavail

                  Firewall Policy: cluster

                      Auto Revert: true

                      Sticky Flag: false

    Fully Qualified DNS Zone Name: none

          DNS Query Listen Enable: false

   Load Balancing Migrate Allowed: false

             Load Balanced Weight: load

              Failover Group Name: system-defined

                         FCP WWPN: -

                   Address family: ipv4

                          Comment: -

                     Vserver Name: parisi-cdot-02

           Logical Interface Name: clus1

                             Role: cluster

                    Data Protocol: none

                        Home Node: parisi-cdot-02

                        Home Port: e0a

                     Current Node: parisi-cdot-02

                     Current Port: e0a

               Operational Status: up

                  Extended Status: -

                       Numeric ID: 1015

                          Is Home: true

                  Network Address: 172.31.57.237

                          Netmask: 255.255.192.0

              Bits in the Netmask: 18

                  IPv4 Link Local: -

               Routing Group Name: c172.31.0.0/18

            Administrative Status: up

                  Failover Policy: nextavail

                  Firewall Policy: cluster

                      Auto Revert: true

                      Sticky Flag: false

    Fully Qualified DNS Zone Name: none

          DNS Query Listen Enable: false

   Load Balancing Migrate Allowed: false

             Load Balanced Weight: load

              Failover Group Name: system-defined

                         FCP WWPN: -

                   Address family: ipv4

                          Comment: -

3 entries were displayed.

shatfield
8,808 Views

Depending on the host config a jumbo MTU may not work.  But you should see MTU related alerts in that case.  vSims default to a 1500MTU for that reason. 

If you're in the brewery the hosts are probably configured to handle the jumbo frames. 

Public