Solved: Re: FAS3050 HA cluster factory reset

SHANE_GIBSON · ‎2011-05-11

Hi All,

I inherited two NetApp FAS3050 units (ONTAP image 10.0.1P2), with 4 full shelves. These units are/were setup in an HA cluster. I'm fine with running in an HA cluster, but I want/need to wipe the units clean and start with a fresh slate. It's been *years* since I've messed with Ontap/NetApps. I've been poking around, and I've found lots of "almost right" solution. But, I keep tripping over the HA bit. Nothing I do is allowed, since the HA cluster exists, and the replication service doesn't seem to be running. All commands fail.

I've tried using the "init" command in the Boot Menu - but get a message saying:

Sorry, this option is not valid once the system has already been configured.

If I try the "setup" command, I can input a "hostname" and "description", but then again - get an error:

ERROR: Failed to change the node name/location

I've seen the following suggestion:

priv set diag

halt -c factory

This does not seem to work - I got it working with:

set -privilege diagnostic

But, halt on this system doesn't take a "-c factory" argument, I can only:

sd1gx1a::*> halt ?

(system halt)

[-node] <nodename> Node

[[-inhibit-takeover] (true|false)] Disallow storage takeover by partner

[[-reason] <text>] Reason for shutdown

If I try and delete ... well, anything (volumes, vservers, cluster stuff ... whatever), I get an error

sd1gx1a::volume*> unmount -vserver vs0 -volume sd1nv001

ERROR: command failed: Failed to delete junction record for volume 'sd1nv001'

in virtual server 'vs0'. Reason: Local unit offline.

sd1gx1a::volume*> delete -vserver vs0 -volume sd1nv001

WARNING: Are you sure you want to destroy volume sd1nv001 in vserver vs0 ?

(y or n): y

ERROR: command failed: Failed to load job for Delete sd1nv001: Replication

service is offline

In summary - I'm looking for a quick how-to on how to NUKE this system and drop it back to factory defaults, so I can start clean. This HA cluster part is really throwing a wrench in the works for me.

Thanks in advance for any help

These units are both runnint ONTAP 10.0.1P2 images.

mitchells · ‎2011-05-12

Shane,

The problem is you were trying to use Data ONTAP directions with Data ONTAP GX.

Procedure

WARNING: This procedure will wipe all configuration and flexvol data from this pair of nodes and zero all disks directly attached. There is no way to recover from this procedure if it is performed unintentionally. This process is destructive and will remove all data and configuration from the HA pair of nodes.

To wipe the nodes clean, follow these steps:

Attach a serial console to both these nodes.
Reboot the nodes from ngsh, one at a time:
::> node reboot -node node-5 -reason "Do I need a reason"
Watch carefully as the nodes reboot. When you see the message "Starting AUTOBOOT press any key to abort...", press any key and it will drop you to the CFE prompt.
Proceed to the other node:
::> node reboot -node node-6 -reason "Do I need a reason"
Let this node start the AUTOBOOT. During the bootup, you will see this banner approximately 20-25 seconds after autoboot starts:
    *******************************     *                             *     * Press Ctrl-C for Boot Menu. *     *                             *     *******************************
    Press Ctrl+C a few times after you see this message. Somewhere in the middle of all the logging, you will see:

    Boot Menu will be available.
    It will drop you to a "Boot Menu" where you will see the following:

    How would you like to continue booting?          (normal)      Normally     (install)     Install new software first     (setup)       Run setup first     (init)        Initialize disks and create flexvol     (maint)       Boot into maintenance mode     (syncflash)   Update flash from backup config     (reboot)      Reboot device     Please make a selection:

    Enter the selection that is not listed:

    wipeconfig
It will pose the question:
This will destroy your filesystem (and the SFO partner as well). Are you sure you want to continue? (y or n): Answer y for yes.
The wipeconfig procedure will operate on that node and all disks will be zeroed, which may take approximately 48 minutes with 146GB drives.
Once the disk zeroing is complete, the node will reboot and start the AUTOBOOT process again. It will start to bootup and then clean up a few things automatically, and then reboot on its own again. It will restart the AUTOBOOT, and then sit at the "Boot Menu" automatically waiting for you to configure the node.
Now we will perform some steps on the "partner" node to clean it up. The "wipeconfig" won't be necessary since all disks have been zeroed already, we just need to clear out some local configuration information off the node. This partner node will be sitting at the "CFE>" prompt since we broke it out of autoboot at the beginning of this process. We will set a few CFE environment variables, and then start Data ONTAP GX, which will read these CFE environment variables, and perform the necessary actions.

    Here are the two variables we will set:

    CFE> setenv bootarg.init.wipeclean true     CFE> setenv bootarg.init.clearvarfsnvram true
    To start the booting process:

    CFE> boot_ontap

    It will start to bootup and then clean up a few things automatically, and then reboot on its own again. It will restart the AUTOBOOT, and then sit at the "Boot Menu" automatically waiting for you to configure the node.

So now there will be two nodes sitting at the "Boot Menu" waiting to be configured. Be sure not to confuse the two, as their identities will be configured in 1010920.

View solution in original post

scottgelb · ‎2011-05-11

It sounds like you want to put ONTAP 7.3 on this system and not GX? This is a sample procedure...use at your own risk and you will lose all data on the controllers. You also need to have 7g licenses after you get ONTAP 7.3 installed. If you want to keep GX, then you can unjoin the cluster on each node and init the disks then create a new cluster within GX...

This is to revert GX to 7.3... if this is the goal...again at your own risk and you need new licenses (approvals, etc...)

Make sure GX is SFO (failover pair) is not in takeover and that the cluster is healthy.
- GX::> storage fail show
- GX::> cluster show
node::> system halt -node nodename* -inhibit-takeover true
Stop the filer at the CFE/Loader prompt. CFE for your 3050.
CFE> unsetenv ONTAP_NG
CFE> set-defaults
CFE> bye
Break the boot to get to the CFE/ firmware (any key at initial boot) at “Starting AUTOBOOT press any key to abort”
Configure an onboard Ethernet port for netboot
- ifconfig e0a -addr=filer_addr -mask=netmask -gw=gateway -dns=dns_addr -domain=dns_domain
Ping the TFTP server to test connectivity (tftp server is 192.18.1.200)
CFE> ping ip
Netboot the ONTAP 7G Kernel
- CFE> netboot tftp://ip/7351P4_netboot.e.gz # ELF code on your 3050
Press CTRL-C for special boot menu

7g Special Boot menu...boot to maintenance mode

(1) Normal boot.

(2) Boot without /etc/rc.

(3) Change password.

(4) Assign ownership and initialize disks for root volume.

(4a) Same as option 4, but create a flexible root volume.

(5) Maintenance mode boot.

o Selection (1-5)? 5

o Continue with boot? yes

• Remove Disk Ownership

o *> disk remove_ownership all

o *> disk show -v

• Netboot again, zero 3 root disks and create root volume (4a) and go through the setup wizard

o *> halt

o CFE> bye

o CFE> ifconfig e0a -addr=filer_addr -mask=netmask -gw=gateway -dns=dns_addr -domain=dns_domain

o CFE> ping ip-tftpserver

o CFE> netboot tftp://ip/7351P4_netboot.e.gz

o Press CTRL-C for special boot menu # same menu as above

o Selection (1-5)? 4a # disks zero and ONTAP comes up to “setup”

o Enter Setup Prompts (note: ONTAP is not yet on the flash or disks)

• Auto Assign Drives (note: we only zeroed what we needed for root, so some GX aggregates may exist - we could have assigned all drives prior to running 4a above and then skip this step)

o ONTAP 7G will auto-assign all disks in the root aggregate disk loop

o node> aggr status

o For any foreign aggregates

node> aggr destroy aggrname

• Push the Full ONTAP Image, run DOWNLOAD (must do this)

o node> software update http://192.168.1.200/7351P4_setup_e.exe -r

• Check flash card for image

o node> version -b

• Add licenses

o node> license add XXXXXX XXXXXX XXXXXX

• Configure CIFS/NFS (or wait if you don’t want to yet)

• Reboot to commit ONTAP update

o node> reboot -t 0

• Enable Cluster and complete 7G setup

o node> cf enable

CLUSTERMAGNET · ‎2011-05-11

guys, when you do netboot http://blah/ontap_boot_.q and it comes back as non ELF... does it expect something .gz?

Thanks!

aborzenkov · ‎2011-05-11

No. It expects different boot image ☺ There are 4 different images for different types of controller. Go to Data ONTAP downloads on NOW and check which one is suitable for your hardware.

scottgelb · ‎2011-05-11

3050 is elf.. E code..

Typos Sent on Blackberry Wireless

SHANE_GIBSON · ‎2011-05-12

Scott - thanks for the lengthy reply. I don't have NetApp support contracts - and it appears that's the only way to download ONTAP? I'm also unfamiliar with the differences between the 7.3 code and the GX code.

At the end of the day - I'm happy to have:

a) the latest stable code on these units

b) keep them in an HA config cluster

c) nuke everything that's on them currently - starting fresh

I'll take a look at your suggestions today, and see if any of it works for me. Thank you.

~~shane

mitchells · ‎2011-05-12