Solved: FAS2520 (double controller) complete reset

technicalJBB · ‎2019-09-20

Hi all, hoping to find a little help here.

I have a FAS2520 double controller system with external DS4246 disk shelves that I need to perform a complete reset on, and I'm having a series of problems. So far, I've managed to reset the admin password on one controller, and I've attemtped to use boot menu option 4 to issue a 4a command (to zero the disks and get the system into the setup state), but I'm not having any luck. I've also tried the procedure described in KB 1030427.

The serial session on controller 1, after 4a is issued, waits for a while and then shows a message that 'Your boot menu selection, "4a", has been cancelled.'

If I connect a serial session to controller 2, the preboot messages appear to load normally but:

I cannot login using the admin password I reset on controller 1 (password has not synced)
If I try to use the boot menu option 3 to reset the password, it does not succeed and the error message noted below is shown
An error message is displayed on boot, whether I select a boot option or not

The error message I'm seeing on controller 2 is, "[hwhnetapp-02:mgmtgwd.rootvol.recovery.changed:EMERGENCY]: The contents of the root volume might have changed and the local management databases might be out of sync with the replicated databases. This node is not fully operational. Contact technical support to obtain the root volume recovery procedures.", followed by "[hwhnetapp-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED."

Any pointers here would be appreciated!

Ontapforrum · ‎2019-09-20

Hi,

Just looking at this error 'local management databases might be out of sync with the replicated databases'. This could happen, if you have NOT unjoined the node first, before proceeding with erase ?

B'cos 'cluster unjoin' would have removed the node's meta-data from the RDB, and then it would be like a stand-alone Node, which is much easier to wipe.

I think after reseting the admin password, you should have booted up normally, unjoined the 2nd node first and then erase.

I am not sure if this procedure is still applicable in your case, but you could try:

Try to restore the node's (out-of-sync) configuration and then unjoin first, and then erase it.

Recovering a node configuration:
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-3D0DFCE8-DAA3-4CB9-B249-76312A6A261E.html

Examples:
::>system configuration backup show
::*> system configuration recovery node restore -backup cluster1.8hour.2011-02-22.18_15_00.7z

If the above procedure does not work:1) Reboot the Node, go to option 3) Maitenance mode.
>aggr status
one by one take each aggregate offline and destory it, all the data aggr first and finally root. (Aggr offline/destroy)
>halt <enter>

2) When system reboots, again go to menu option and try '4' erase thing again.

Good luck!

View solution in original post

aborzenkov · ‎2019-09-20

https://kb.netapp.com/app/answers/answer_view/a_id/1030427/~/how-to-wipe-the-configuration-of-a-clustered-data-ontap-8.x-node-and

technicalJBB · ‎2019-09-20

Thanks @aborzenkov - I've tried those steps already and that's where I've gotten stuck.

OntapCore · ‎2019-09-20

Hello technicalJBB,

If the KB that @aborzenkov provided does not help resolve the issue, we recommend opening a NetApp technical support case: https://mysupport.netapp.com

Of note, Option 4a is not recommended to use anymore. Please use Option 4 from the boot menu when doing a new cluster setup or when repurposing NetApp controllers. Or if considering using Advanced Drive Partitioning (ADP), use Option 9.

Here's more information on the ONTAP Boot Menu options:

https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-B86616AE-D345-4B44-AA56-BBC7ABD44068.html

technicalJBB · ‎2019-09-20

OK, thanks - I'll see about opening a support case.

FYI I didn't enter option 4a - I just selected option 4. The error message descriptive text still describes it as a 4a command.

It's probably worth noting that this system is running ONTAP 8.3.1 - I don't see option 9 in the boot menu.

Ontapforrum · ‎2019-09-20

Hi,

Just looking at this error 'local management databases might be out of sync with the replicated databases'. This could happen, if you have NOT unjoined the node first, before proceeding with erase ?

B'cos 'cluster unjoin' would have removed the node's meta-data from the RDB, and then it would be like a stand-alone Node, which is much easier to wipe.

I think after reseting the admin password, you should have booted up normally, unjoined the 2nd node first and then erase.

I am not sure if this procedure is still applicable in your case, but you could try:

Try to restore the node's (out-of-sync) configuration and then unjoin first, and then erase it.

Recovering a node configuration:
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-3D0DFCE8-DAA3-4CB9-B249-76312A6A261E.html

Examples:
::>system configuration backup show
::*> system configuration recovery node restore -backup cluster1.8hour.2011-02-22.18_15_00.7z

If the above procedure does not work:1) Reboot the Node, go to option 3) Maitenance mode.
>aggr status
one by one take each aggregate offline and destory it, all the data aggr first and finally root. (Aggr offline/destroy)
>halt <enter>

2) When system reboots, again go to menu option and try '4' erase thing again.

Good luck!

technicalJBB · ‎2019-09-23

Thanks @Ontapforrum - the aggr offline/destroy option appears to be getting me somwhere - the option 4 command is actually showing progress zeroing the disks in the shelves now. I'll check in again if this still seems weird, and flag your post as a solution if I'm able to bring this system back to the setup state.

Ontapforrum · ‎2019-09-23

Thanks for the update. I am positive it will work. Good luck!

technicalJBB · ‎2019-10-03

Thanks @Ontapforrum - this worked perfectly!

FAS2520 (double controller) complete reset

FAS2520 Controller replacement

Controller reset with RAID1-Volume on RAID6-DDP

FAS2520 node down

FAS2520 Cluster from 2 standalone 2520 w single controller for NFS

FAS2520 Failed flash pool, bug 1335350