Simulator Discussions

Unable to add second node - 8.2 cDOT - ESXi 5.1

CHRISMAKI
17,676 Views

I am having problems adding a second node to my virtual cluster. The first node started up fine and I ran through the cluster create script. I've got the first two vNICs on a separate vSwitch as they are the cluster interfaces. Here's what happens on booting of the second node:

 

1: Join or create? join

2: Are these the IPs you want (169.254.*)? yes

3: Enter the name of the cluster: [ClusterName] <enter>

4: Joining cluster …

5: Network set up …

6: Node check …

7: Restarting Cluster Setup …

8: Revert to step #1

 

When I try to ping the two IPs presented in step 2 from Node 1 I am able, so the networking is setup properly but the second node won't join. Any thoughts on what I'm missing?

1 ACCEPTED SOLUTION

CHRISMAKI
16,735 Views

Hey everyone, I just wrote a document on how to do this in 17 steps, go here.

View solution in original post

16 REPLIES 16

CHRISMAKI
17,473 Views

I should also note that the only error message I get is:

Error: Cluster join membership failed.

It does this during the Node Check so perhaps it's related to that?

SeanLuce
17,473 Views

It sounds like you haven't changed the second node's System ID and Serial Number.

From page 31 of the Installation and Setup Guide:

9. Press the space bar when the Hit [Enter] to boot immediately, or any other key for command prompt. Booting in 10 seconds... message is displayed.

          You should see a VLOADER> prompt.

10. Change the Serial Number and System ID for this node:

          VLOADER> setenv SYS_SERIAL_NUM 4034389-06-2

          VLOADER> setenv bootarg.nvram.sysid 4034389062

11. Verify that the information was saved correctly by entering the following two commands:

          VLOADER> printenv SYS_SERIAL_NUM

          VLOADER> printenv bootarg.nvram.sysid

12. Enter the boot command to boot the node: VLOADER> boot

          The simulator begins the boot process with the new system id and serial number

This needs to be done before you boot the second node for the first time.  If you have already done an 'option 4' on the second node, unpack a new copy and start fresh.

It is critical that you use the values provided for the Serial Number and System ID as the new 8.2 licenses are node locked based on these values.

To get the most out of the 8.2 simulator check out these blog posts:

http://www.cosonok.com/2013/08/clustered-ontap-82-sim-maximizing.html

http://www.cosonok.com/2013/09/a-new-sim-recipe.html

Here is a link to the install guide for the sim: http://support.netapp.com/knowledge/docs/simulate_ontap/Simulate_ONTAP_8.2_Installation_and_Setup_Guide.pdf

I hope this helps!

Sean Luce

Open Systems Technologies

CHRISMAKI
17,473 Views

The answer RTFM probably would have been well deserved up until I performed those steps and then had a new problem. After changing both of those parameters I ended up with a new issue. I even deployed a new version of the VM that had yet to be booted as per the instructions but ended up with the following:

--------------------------------------------------

PANIC: Can't find device with WWN 0x1400322304. Remove '/sim/dev/,disks/reservations' and restart. in SK process vha_disk_resv on release 8.2 (C) on Sun Sep 15 21:12:48 GMT 2013

version: 8.2: Tue May 21 05:58:22 PDT 2013

compile flags: x86_64

recursive PANIC: page_t has no physical address

cpuid = 0

Uptime: 38s

The operating system has halted.

Please press any key to reboot.

System halting...

cpu_reset called on cpu#0

--------------------------------------------------

Any further advice?

sgrandjean
17,473 Views

I have exact the same problem. The setup guide is made for ESX 4.1 and for 5.x things work different. The situation with al the small VMDKs does not work. When you use VM converter this is fixed or you just remove the harddisk4 en recreate it.

But after changing the system ID the error occurs with a panic. Maintenance mode can not be booted.

So what's wrong here?

sgrandjean
17,473 Views

As searched there is multiple trouble and everything seems to be with the initial startup. As soon as the cristeen sim has been started and you want to change the system ID it will not work.

As found on the net I did deploy the files again and followed the script. This time it worked.

Deploy the files en add to inventory but remove the the sim VMDK's en recreate it as a big flat file (make adjustments in vmx and vmdk config files). Otherwise it will not work in ESX 5.x. Then boot and press <space> as in the script and follow the adjustment of altering the system ID. boot OnTap and join the cluster.

The NetApp PDF needs some slight adjustments and it should be pointed out the keer de tar bal or the deployed VM. As soon as you start it you have to redeploy.

I now have a working two node and single cDOT cluster.

CHRISMAKI
17,473 Views

I guess since I figured this out a few days ago I should have posted an update.

I was working from a thick-provisioned VMDK provided to me by a colleague at NetApp. When following the PDF more recently instead of just bashing ahead, I had much more success.

Proceed as follows:

  1. Download the vsim_esx-cm.tgz and transfer it to your datastore.
  2. tar -xvzf vsim_esx-cm.tgz
  3. Now how many nodes do you want, two? Maybe a third for a replication "cluster"? Make as many copies of the vsim_esx-cm directory, -1 as you want nodes, naming them accordingly. Once done, rename the initial directory (vsim_esx-cm) to whatever the final node is so that you have a uniform naming syntax.
  4. Browse your datastore for the first node's directory and import the VMX file. Boot this node to start your cluster.
  5. Browse your datastore for the second directory, on the VERY FIRST BOOT enter the loader and change the SYSID stuff listed in the setup guide.
  6. Subsequent nodes in the same cluster will all require new SYSIDs but really two nodes in a cluster should be sufficient.
  7. On first full boot, i.e.: not having entered the loader, you'll want to hit up the maintenance menu for option 4.

I have a third node in a single node cluster that I intend to use as a snapmirror target though I have yet to set that part up, hopefully the fact that it's SYSID will match Node 1 in the 2 node cluster won't matter. If it does, I'll start from scratch, changing the SYSID on first boot.

sgrandjean
17,473 Views

Dear Chris,

Thank you for your find back. Your email just find me as I go home (late, again). But with a smile as I managed to gets this running.

I also have a dual node cluster and one single now, including all the licenses. I was just starting to make the vServer.

The procedure is clear once know you what to done and it just takes some time to understand. Luckly I’m VCP so I understand some ESX.

The bit of the first boot is very crucial. Some thing happens when you first boot the VM.

Thin of thick provisioned does not matter. But the VMDK that make up the sim flat file will not work ou-of-the-bos that is the only thing.

And yes I agree copying the cristeen files is very very important.

But then again how will you create the “keys” for two extra nodes to make a four node cluster?

Also I find that moving the managment LIF breaks my SSH session which I did not expected… Maybe the LIF of the Vserver will be non-disruptive. ☺

I go home now and will start to make a Vserver tomorrow maybe, I still have some work to do.

Again thank you very much!

With kind regards,

Sebastian

CHRISMAKI
17,473 Views

Sebastian,

Just for fun I will create a third node today for the primary cluster today and see if the keys work, it is exactly this reason I have hesitated as I'm pretty sure they won't work.

The reason your SSH session drops is because it is a stateful protocol and once the IP moves, the switch(es) ARP, you're going to get dropped. There's nothing you can do about this

Non-technical: Twice now you've used the word "cristeen" which isn't actually a word, I think you meant to type "pristine" perhaps? Either way, "a fresh copy of the vsim" is what is required.

I'll go deploy node 3 now and update soon.

Oh, also back on the technical side, I'm doing the following on my ESXi 5.1 box:

-------------

# cat /etc/rc.local.d/local.sh

#!/bin/sh

# -- Loading Module multiextent to support NetApp vSIM 8.1.1 --

/sbin/vmkload_mod multiextent

-------------

Not sure if this is still required, but it doesn't appear to be hurting. Also, when I do a "vmkload_mod -l", I've got hits under the "Used" column.

CHRISMAKI
17,473 Views

Just confirmed, the two sets of licenses provided here will be node locked to the initial node and the one you change the sysid to the one ending in -2. Additional nodes can be added but you can't license anything on them but really there's no point anyway.

sgrandjean
16,701 Views

Dear Chris,

A second HA-pair is nice but has no functional extra’s. One HA pair will do for now and a snap<mirror/vault> to the single node cluster.

As a NetApp partner I should be able to create a master key and use this. But not this week, hihihi

With kind regards,

Sebastian

sgrandjean
16,702 Views

Dear Chris,

Non-technical: Twice now you've used the word "cristeen" which isn't actually a word, I think you meant to type "pristine" perhaps? Either way, "a fresh copy of the vsim" is what is required.

I’m sorry offcourse I’am not a native UK speaker.

Indeed I meant “fresh image” as in untouched. So we mean the same…

With kind regards,

Sebastian

SAVBUTEAM
16,701 Views

I am trying to add  the second node to the DataONTAP simulator cluster on ESXi5.1 .And I saw your blog as I googled for the issue I encouterd.

I have the same issue you mentioned in your blog:

PANIC: Can't find device with WWN 0x1400322304. Remove '/sim/dev/,disks/reservations' and restart. in SK process vha_disk_resv on release 8.2 (C) on Sun Sep 15 21:12:48 GMT 2013

version: 8.2: Tue May 21 05:58:22 PDT 2013

compile flags: x86_64

recursive PANIC: page_t has no physical address

cpuid = 0

Uptime: 38s

The operating system has halted.

Please press any key to reboot.

I have already switch the disk 4 from the “thin provision” to the “thick eager provision” and the other three disk are all under think provisoin.

But I after I change the system ID and bootarg .nvram id according to the PDF guide I still see the above error.

Could you please give me some guidance?

Thank you very much.

Lin

CHRISMAKI
16,736 Views

Hey everyone, I just wrote a document on how to do this in 17 steps, go here.

sgrandjean
16,701 Views

Dear Chris,

Thank you for creating the update document. I think this will do the trick.

I got e-mails with questions and now I can people re-direct to you update, thanks for that.

Maybe you can put in the think with the SIM-vmdk with extends which should first be converted to a single file on ESX 5.x? If you do not use the VM converter this will give problems aswell…

With kinds regards,

Sebastian

CHRISMAKI
11,622 Views

I am headed to Insight this morning but will work on a version that doesn't require the multiextent kernel module upon my return.

sgrandjean
16,701 Views

Dear Lin,

It is a bit silly how this works…

But the things that the procedure only works when followed exactly as documented but they forget to tell you that you can only use the clean files.

So you have node one running. You can not use these files for node two. You really have to download again the initial files.

Thin or thick provisioned has nothing to do with it. But the initial files wil not work on ESX5.x.

So first download the tgz en extract and “add it to inventory”. Do not start it!!!

Then delete the hard disk4 in ESX VM settings en re-create it with 250gb space.

When you delete hard disk 4 please select delete files from datastore aswell.

Now you are set. The big importance is the nvram file should be untouched everything depends on this

Start the Console of the VM en then start the VM. Press ”. Now follow the procedure and change the SYSTEM ID. “boot” and choose option 4.

If for some reason things the VM boots the NVRAM vmdk gets touched and you can not change the SYSTEM ID because this will lead to a PANIC.

I hope this will help you in getting it running, please let me know.

With kind regards,

Sebastian

Public