2013-09-14
07:27 PM
- last edited on
2014-09-29
11:30 AM
by
alissa
I am having problems adding a second node to my virtual cluster. The first node started up fine and I ran through the cluster create script. I've got the first two vNICs on a separate vSwitch as they are the cluster interfaces. Here's what happens on booting of the second node:
1: Join or create? join
2: Are these the IPs you want (169.254.*)? yes
3: Enter the name of the cluster: [ClusterName] <enter>
4: Joining cluster …
5: Network set up …
6: Node check …
7: Restarting Cluster Setup …
8: Revert to step #1
When I try to ping the two IPs presented in step 2 from Node 1 I am able, so the networking is setup properly but the second node won't join. Any thoughts on what I'm missing?
Solved! SEE THE SOLUTION
2013-09-14 08:08 PM
I should also note that the only error message I get is:
Error: Cluster join membership failed.
It does this during the Node Check so perhaps it's related to that?
2013-09-15 05:46 AM
It sounds like you haven't changed the second node's System ID and Serial Number.
From page 31 of the Installation and Setup Guide:
9. Press the space bar when the Hit [Enter] to boot immediately, or any other key for command prompt. Booting in 10 seconds... message is displayed.
You should see a VLOADER> prompt.
10. Change the Serial Number and System ID for this node:
VLOADER> setenv SYS_SERIAL_NUM 4034389-06-2
VLOADER> setenv bootarg.nvram.sysid 4034389062
11. Verify that the information was saved correctly by entering the following two commands:
VLOADER> printenv SYS_SERIAL_NUM
VLOADER> printenv bootarg.nvram.sysid
12. Enter the boot command to boot the node: VLOADER> boot
The simulator begins the boot process with the new system id and serial number
This needs to be done before you boot the second node for the first time. If you have already done an 'option 4' on the second node, unpack a new copy and start fresh.
It is critical that you use the values provided for the Serial Number and System ID as the new 8.2 licenses are node locked based on these values.
To get the most out of the 8.2 simulator check out these blog posts:
http://www.cosonok.com/2013/08/clustered-ontap-82-sim-maximizing.html
http://www.cosonok.com/2013/09/a-new-sim-recipe.html
Here is a link to the install guide for the sim: http://support.netapp.com/knowledge/docs/simulate_ontap/Simulate_ONTAP_8.2_Installation_and_Setup_Guide.pdf
I hope this helps!
Sean Luce
Open Systems Technologies
2013-09-15 07:59 PM
The answer RTFM probably would have been well deserved up until I performed those steps and then had a new problem. After changing both of those parameters I ended up with a new issue. I even deployed a new version of the VM that had yet to be booted as per the instructions but ended up with the following:
--------------------------------------------------
PANIC: Can't find device with WWN 0x1400322304. Remove '/sim/dev/,disks/reservations' and restart. in SK process vha_disk_resv on release 8.2 (C) on Sun Sep 15 21:12:48 GMT 2013
version: 8.2: Tue May 21 05:58:22 PDT 2013
compile flags: x86_64
recursive PANIC: page_t has no physical address
cpuid = 0
Uptime: 38s
The operating system has halted.
Please press any key to reboot.
System halting...
cpu_reset called on cpu#0
--------------------------------------------------
Any further advice?
2013-10-03 05:04 AM
I have exact the same problem. The setup guide is made for ESX 4.1 and for 5.x things work different. The situation with al the small VMDKs does not work. When you use VM converter this is fixed or you just remove the harddisk4 en recreate it.
But after changing the system ID the error occurs with a panic. Maintenance mode can not be booted.
So what's wrong here?
2013-10-03 08:20 AM
As searched there is multiple trouble and everything seems to be with the initial startup. As soon as the cristeen sim has been started and you want to change the system ID it will not work.
As found on the net I did deploy the files again and followed the script. This time it worked.
Deploy the files en add to inventory but remove the the sim VMDK's en recreate it as a big flat file (make adjustments in vmx and vmdk config files). Otherwise it will not work in ESX 5.x. Then boot and press <space> as in the script and follow the adjustment of altering the system ID. boot OnTap and join the cluster.
The NetApp PDF needs some slight adjustments and it should be pointed out the keer de tar bal or the deployed VM. As soon as you start it you have to redeploy.
I now have a working two node and single cDOT cluster.
2013-10-03 09:23 AM
I guess since I figured this out a few days ago I should have posted an update.
I was working from a thick-provisioned VMDK provided to me by a colleague at NetApp. When following the PDF more recently instead of just bashing ahead, I had much more success.
Proceed as follows:
I have a third node in a single node cluster that I intend to use as a snapmirror target though I have yet to set that part up, hopefully the fact that it's SYSID will match Node 1 in the 2 node cluster won't matter. If it does, I'll start from scratch, changing the SYSID on first boot.
2013-10-03 09:42 AM
Dear Chris,
Thank you for your find back. Your email just find me as I go home (late, again). But with a smile as I managed to gets this running.
I also have a dual node cluster and one single now, including all the licenses. I was just starting to make the vServer.
The procedure is clear once know you what to done and it just takes some time to understand. Luckly I’m VCP so I understand some ESX.
The bit of the first boot is very crucial. Some thing happens when you first boot the VM.
Thin of thick provisioned does not matter. But the VMDK that make up the sim flat file will not work ou-of-the-bos that is the only thing.
And yes I agree copying the cristeen files is very very important.
But then again how will you create the “keys” for two extra nodes to make a four node cluster?
Also I find that moving the managment LIF breaks my SSH session which I did not expected… Maybe the LIF of the Vserver will be non-disruptive. ☺
I go home now and will start to make a Vserver tomorrow maybe, I still have some work to do.
Again thank you very much!
With kind regards,
Sebastian
2013-10-03 10:04 AM
Sebastian,
Just for fun I will create a third node today for the primary cluster today and see if the keys work, it is exactly this reason I have hesitated as I'm pretty sure they won't work.
The reason your SSH session drops is because it is a stateful protocol and once the IP moves, the switch(es) ARP, you're going to get dropped. There's nothing you can do about this
Non-technical: Twice now you've used the word "cristeen" which isn't actually a word, I think you meant to type "pristine" perhaps? Either way, "a fresh copy of the vsim" is what is required.
I'll go deploy node 3 now and update soon.
Oh, also back on the technical side, I'm doing the following on my ESXi 5.1 box:
-------------
# cat /etc/rc.local.d/local.sh
#!/bin/sh
# -- Loading Module multiextent to support NetApp vSIM 8.1.1 --
/sbin/vmkload_mod multiextent
-------------
Not sure if this is still required, but it doesn't appear to be hurting. Also, when I do a "vmkload_mod -l", I've got hits under the "Used" column.
2013-10-03 11:53 AM
Just confirmed, the two sets of licenses provided here will be node locked to the initial node and the one you change the sysid to the one ending in -2. Additional nodes can be added but you can't license anything on them but really there's no point anyway.