Simulator Discussions

Cannot join node to cluster

papadopoulosa
15,825 Views

Hello...  I am having fits with CDOT vSIM 8.2.1, trying to join a second node to the cluster.  Here is what I have done:

1.set, and a net switch and port set.  The cluster set does not have a NIC assigned, but the port set does.

2.     Installed vSIM CDOT 8.2.1

3.     Converted to single vmdk file

4.     Converted to template

5.     Deployed three vSIM boxes, dctcf01-01, dctcf01-03, and dctcf01-05 (leave even numbers for possible HA testing)

5.     Using systemshell, removed existing discs, and created four shelves of 14 type 36 discs.

6.     On reboot went to VLOADER and:          a. Changed SYS_SERIAL_NUM of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)          b. Changed bootarg.nvram.sysid of nodes three and five [Node01 - 4082368511, 4082368513, 4082368515)

7.     Created cluster dctcf01 on node 01

8.     Tried to add dctcf01-03 to cluster

          During Node Check, received error:  "Cluster join membership failed."

          "Restarting Cluster Setup"

I am befuddled as to what to do next.  I've been playing with this for weeks now.  Does anyone have an answer?

29 REPLIES 29

sorgenfr
5,374 Views

It appears to be a mailbox related issue.  The message is "fmmb.BlobNotFound: blob_id="19, owner="PARTNER", reason="

Invalid Mailbox state of 0x8001"

We are starting from the beginning and will keep everyone posted.

papadopoulosa
5,374 Views

No go.  Still getting the mailbox error.

shatfield
5,374 Views

If its mailbox related, try destroying the mailboxes.

Reboot, Ctrl-C for boot menu

option 5-maintenance mode boot

mailbox destroy local

mailbox destroy partner

mailbox destroy all

halt

hit all the nodes then power them back up.

sorgenfr
5,374 Views

No luck.  What log files should I be reviewing and in what directory? /mroot/etc/log or /var/log?

papadopoulosa
5,636 Views

Issue with CDOT and ESXi 5.1 resolved.

The issue has to do with the fact that the CDOT vSIM is distributed in old style "multi-extent" vmdk's.

I had followed the procedures to load the multi-extent module to ESXi, convert the disks as shown in the VMware TR referenced in this forum, unload the module and then try starting up CDOT.  I was also changing the disc configuration and the serial # of the second node.  The trouble I was running into was that the CDOT mailboxes were always corrupted and no matter what we tried (Bill Sorgenfrei/NetApp), we could not fix them.

In the end I loaded the vSIM on VMware Workstation 10, which worked just fine.  That gave me the idea.

Luckily, I have a nested ESXi instance within my main ESXi, so I installed the multi-extent module and left it active.

I then untared the tar files, renamed the directories they were in the way I wanted, then imported the VM's in to vCenter, within my nested ESXi instance.

Prior to startup, I followed the procedures to reconfigure my discs, and system serial, NVRAM SysId, and immediately performed a New Install with Init (Option 4 on the Special Boot Commands Menu).

Everything appears to be working normally.  I will verify the installation with Bill later, but I think we've licked the issue.

So here are my take-aways:

1.     If there is anyone else out there using the ESXi version of the CDOT vSIM, can you tell us if you loaded multi-extent, or converted?

2.     I found that once a vSIM CDOT node is setup, it saves the configuration in simulated Flash, and that is why a lot of people on this forum recommend untaring a fresh copy every time.  (NetApp, please tell us what to set to clear away the old configuration, so we can re-configure the node/clust after re-initializing  the system).

3.     NetApp, please supply a version of the vSIM for non-multi-extent systems, i.e. ESXi v5.x and above.

Thank you everyone for your help.

70tas (Tas)

papadopoulosa
5,637 Views

BTW, how do I mark my own replay as the ANSWER?

shatfield
5,636 Views

Glad you finally got it.

1.  I typically just delete the sim vmdk that comes with the simulator.  In most cases, a blank vmdk in IDE1:1 will be partitioned and formatted during that initial boot.  I do have a test host where I just leave multiextent enabled, along with some other VSA related optimizations, but a blank vmdk is my preferred way of dealing with it.  Most of my sim scenarios don't need 250gb anyway and this lets me use a smaller disk where appropriate.

2. Here are the places the sim stores persistent data after initial boot:

     On IDE0:0 (virtual cf card): loader environment

     On IDE0:1 (/var): the var file system

     On IDE1:0 (VNVRAM/misc).  nvram, swap, core

     On IDE1:1 (/sim). sim disks, sim tapes, lock files

You could scrub it and reuse a sim, but its really not worth the effort.  It would be something like

boot menu:systemshell: rm everything under /sim

boot menu: wipeconfig (to clear /var)

boot cycle once for the wipeconfig

loader prompt: setenv a bunch of stuff back to defaults (no set-defaults in sim loader)

Then boot and run option 4

If I'm running a lot of scenarios on a particular build I'll make a custom ovf that has autoboot off and everything else virgin so I can crank out sim instances without all the hassles. 

I'm also a big fan of 44/4a in the simulator.

papadopoulosa
5,636 Views

Well, I got it, but I didn't.  So am I to understand that you delete and re-create IDE1:1 in non-multi-extent mode and that works?

That would be awesome!

BTW, what is 44/4a?

And finally, thanks for all of the help.

shatfield
5,636 Views

Yes, before first boot I just delete IDE1:1, create a new blank virtual disk on ide1:1 (careful not to make a scsi disk).  Thin provisioned is usually fine.  There were a couple of builds where it didn't work (8.2.1rc?) but usually that's all there is to it.

option 4 zeros all disks and creates a root aggregate.  44/4a zeros all disks if required and creates a root aggregate.  Since vsim_makedisks marks all the sim disks as prezeroed it saves you a wall of dots on a new sim install.  Note its not particularly useful in the real world, because ontap is factory installed if it goes through manufacturing, and new shelves are shipping in a nonzeroed state if it comes from distribution.

Public