ONTAP Discussions

Replace FAS2040 controller module

jgiang72
11,536 Views

I need assistant from your expert people. We have fas2040 setup as active/active configuration and both heads have disk and storage assigned to each controller. This is strictly cifs/NSF environment without block or FC. Last week controller 2 have die on us and we currently in a takeover mode on controller 1. I purchased a refurbished controller this week to replace controller 2. I took controller 2 out of the chassis, remove the CF boot card, nvram  battery,  SPF module and put in the new controller that I purchased. I put back in the storage. I interrupted the controller with Crl+C and get in the BMC shell via the console. I issue BMC config and noticed that BMC does not has the setting of my old controller so I can't telnet via ssh to BMC shell. So I go ahead updated the BMC ip, gateway, etc but unable to update the controller name. I can now telnet to BMC interface but the password I have on my system doesn't work with the new  controller via BMC shell.

I thought it supposes to boot from the cf card and load all the configuration from my dead controller to the new controller and all I have to do is assign the disks to the new system I'd but apparently it is not the case. So right now the new controller is in the chassis up but doesn't have the correct configuration. I have the controller sit at the boot loader.

What is the correct way step by step to get the new controller up and running with configuration from the dead controller without wipe out my existing data and configuration on my Netapp and own the disks that were belong to the dead controller.

So here is the quick capture of my system stage. Controller 1 is currently in takeover mode. Controller 2 is in the system with cf card from my old controller but not boot up nor have the correct configuration as it should be. Controller 2 is in LOADER-B stage.

Please help as the instruction replace fas20xx controller module from Netapp is so outdated or not correct.

Thanks!

J

14 REPLIES 14

jgiang72
11,472 Views

Our on tap version is 8.1 7-mode by the way...

saranraj456
11,472 Views

What is the boot option have you tried?

Saran

aborzenkov
11,472 Views

BMC is synchronized by Data ONTAP when it boots and you need to complete controller replacement and perform giveback to allow Data ONTAP to boot on replacement controller. I reviewed controller replacement instructions and personally I found them pretty much accurate. Did you try to follow them before stating that they are outdated?

jgiang72
11,472 Views

Ok, I have tried as followed.

  1. Get the CF card, battery over to the new controller
  2. Reconnect all cabling and put the controller back in the chassis with controller 1 is still in takeover mode
  3. Interrupted the boot process via the console with Ctrl + C during boot
  4. From the  promt “LOADER-B>”  I type boot_diags
  5. Run mb
  6. Exit
  7. Boot_ontap
  8. Ctrl-c during boot menu and got the menu below

Please choose one of the following:

(1) Normal Boot.

(2) Boot without /etc/rc.

(3) Change password.

(4) Clean configuration and initialize all disks.

(5) Maintenance mode boot.

(6) Update flash from backup config.

(7) Install new software first.

(8) Reboot node.

Selection (1-8)?

  1. Select option 5 to enter Maintenance mode
  2. I then get the following message

In a High Availablity configuration, you MUST ensure that the partner node is (and remains) down, or that takeover is manually disabled on the partner node, because High Availability software is not started or fully enabled in Maintenance mode.

FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED

NOTE: It is okay to use 'show/status' sub-commands such as 'disk show or aggr status' in Maintenance mode while the partner is up.

Jul 18 13:43:19 [localhost:shelf.config.spha:info]: System is using single path HA attached storage only.

Please answer yes or no.

Continue with boot? no

  1. I have to select NO on the above as there is no reference anywhere in the document.
  2. The system halting… after that. I am not able to continue on and the controller rebooted so I have to do ctrl+C to get it to loader prompt and has it stay there for now....as i don't want it wipe out my configuration, data or bring the other controller down just to be on a safe side.

any input on how to get this working again greatly appreciated ....

saranraj456
11,472 Views

you can give "yes" at continue with boot? option.


Saran

saranraj456
11,472 Views

To safer side it is better to disable the CF & do this

aborzenkov
11,472 Views

To safer side it is better to disable the CF & do this

NO! You should never do it when system is in takeover mode and of course never do it for controller replacement.

aborzenkov
11,472 Views

It is safe to just boot into maintenance mode to just record systemid. The prompt also says it: "It is okay to use 'show/status' sub-commands such as 'disk show or aggr status' in Maintenance mode while the partner is up."

If you know new systemid already, you can simply skip it and proceed with disk reassignment.

jgiang72
11,472 Views

I can lookup the system id in BMC without enter in maintenance mode. but need to get into maintenance mode to reassign the disk. By select the option 5, I got a prompt "In a High Availablity configuration, you MUST ensure that the partner node is (and remains) down, or that takeover is manually disabled on the partner node, because High Availability software is not started or fully enabled in Maintenance mode.  FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED" no where in the document tell me what to do and how to make sure "that takeover is manually disabled on the partner node". I wish that netapp can produce a clear and better document and more over that they have netapp engineer monitor the forum and help us out. As right now neither one of us agreed on the correct way of doing it...

aborzenkov
8,978 Views

need to get into maintenance mode to reassign the disk

You do NOT need to go into maintenance mode for that. What gave you that idea? Documentation quite clearly states that it is done from partner controller.

I wish that netapp can produce a clear and better document

There is feedback button. But I again have feeling that you did not even read documentation.

jgiang72
8,978 Views

below is straight from netapp document. read step 1 on page 11 of "Replacing the controller module in a FAS20xx system". Did I miss read that or did you?

Reassigning disks on a system operating in 7-Mode

You must reassign disks before you boot the software. Some of the steps are different depending on whether the system is stand-

alone or in an HA pair.

About this task

•    You must apply the commands in these steps on the correct systems:

•    The target node is the node on which you are performing maintenance.

•    The partner  node is the HA partner of the target node.

•    Do not issue any commands relating to aggregates until the entire procedure is completed.

Steps

1. If you have not already done so, reboot the target node, interrupt the boot process by entering Ctrl-C, and then select the option to boot to Maintenance mode from the displayed menu.

You must enter y  when prompted to override the system ID due to a system ID mismatch.

2. View the new system IDs by entering the following command

aborzenkov
8,978 Views

And where pray do you see that you need to perform reassignment in maintenance mode?

It explains how to lookup systemid.

jgiang72
8,978 Views

Didn't I cut and Pasted that section from the document, bold print it and also told you where to find it.  I have a feeling that you not reading at all and just jump in conclusion what you think it should be. thanks for all the comments. I was hoping some one can collaborate with how-to and better construct then this...not sure which document you're reading but certainly what you stated is no where to be found in the document below.

here is the link to the document.   https://library.netapp.com/ecm/ecm_download_file/ECMM1280334

thank you and have a nice day

Anyone beside ABORZENKOV have any experience with replace the FAS2040 controller do feel free to help me out. So far this is go now where ....

aborzenkov
8,978 Views

I did perform controller replacement more than once, that is why I have reasons to state that replacement procedure is correct. If you have reasons to believe this procedure is incorrect, you need to open support case and discuss it with NetApp engineer.

For the last time - you do not perform disk reassignment from maintenance mode. You halt controller after confirming systemid and reassign disks from partner node.

Public