Ask The Experts

FAS2520 (double controller) complete reset

technicalJBB
7,255 Views

Hi all, hoping to find a little help here.

 

I have a FAS2520 double controller system with external DS4246 disk shelves that I need to perform a complete reset on, and I'm having a series of problems. So far, I've managed to reset the admin password on one controller, and I've attemtped to use boot menu option 4 to issue a 4a command (to zero the disks and get the system into the setup state), but I'm not having any luck. I've also tried the procedure described in KB 1030427.

 

The serial session on controller 1, after 4a is issued, waits for a while and then shows a message that 'Your boot menu selection, "4a", has been cancelled.'

 

If I connect a serial session to controller 2, the preboot messages appear to load normally but:

  • I cannot login using the admin password I reset on controller 1 (password has not synced)
  • If I try to use the boot menu option 3 to reset the password, it does not succeed and the error message noted below is shown
  • An error message is displayed on boot, whether I select a boot option or not

The error message I'm seeing on controller 2 is, "[hwhnetapp-02:mgmtgwd.rootvol.recovery.changed:EMERGENCY]: The contents of the root volume might have changed and the local management databases might be out of sync with the replicated databases. This node is not fully operational. Contact technical support to obtain the root volume recovery procedures.", followed by "[hwhnetapp-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED."

 

Any pointers here would be appreciated!

1 ACCEPTED SOLUTION

Ontapforrum
7,114 Views

Hi,

 

Just looking at this error 'local management databases might be out of sync with the replicated databases'. This could happen, if you have NOT unjoined the node first, before proceeding with erase ?

 

B'cos 'cluster unjoin' would have removed the node's meta-data from the RDB, and then it would be like a stand-alone Node, which is much easier to wipe.

 

I think after reseting the admin password, you should have booted up normally, unjoined the 2nd node first and then erase.


I am not sure if this procedure is still applicable in your case, but you could try:

Try to restore the node's (out-of-sync) configuration and then unjoin first, and then erase it.

 

Recovering a node configuration:
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-3D0DFCE8-DAA3-4CB9-B249-76312A6A261E.html

Examples:
::>system configuration backup show
::*> system configuration recovery node restore -backup cluster1.8hour.2011-02-22.18_15_00.7z


If the above procedure does not work:1) Reboot the Node, go to option 3) Maitenance mode.
>aggr status
one by one take each aggregate offline and destory it, all the data aggr first and finally root. (Aggr offline/destroy)
>halt <enter>

2) When system reboots, again go to menu option and try '4' erase thing again.


Good luck!

View solution in original post

8 REPLIES 8

technicalJBB
7,187 Views

Thanks @aborzenkov  - I've tried those steps already and that's where I've gotten stuck.

OntapCore
7,204 Views

Hello technicalJBB,

If the KB that @aborzenkov provided does not help resolve the issue, we recommend opening a NetApp technical support case: https://mysupport.netapp.com

 

Of note, Option 4a is not recommended to use anymore. Please use Option 4 from the boot menu when doing a new cluster setup or when repurposing NetApp controllers. Or if considering using Advanced Drive Partitioning (ADP), use Option 9.

 

Here's more information on the ONTAP Boot Menu options:

https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-B86616AE-D345-4B44-AA56-BBC7ABD44068.html

technicalJBB
7,185 Views

OK, thanks - I'll see about opening a support case.

 

FYI I didn't enter option 4a - I just selected option 4. The error message descriptive text still describes it as a 4a command.

 

It's probably worth noting that this system is running ONTAP 8.3.1 - I don't see option 9 in the boot menu.

Ontapforrum
7,115 Views

Hi,

 

Just looking at this error 'local management databases might be out of sync with the replicated databases'. This could happen, if you have NOT unjoined the node first, before proceeding with erase ?

 

B'cos 'cluster unjoin' would have removed the node's meta-data from the RDB, and then it would be like a stand-alone Node, which is much easier to wipe.

 

I think after reseting the admin password, you should have booted up normally, unjoined the 2nd node first and then erase.


I am not sure if this procedure is still applicable in your case, but you could try:

Try to restore the node's (out-of-sync) configuration and then unjoin first, and then erase it.

 

Recovering a node configuration:
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-sag%2FGUID-3D0DFCE8-DAA3-4CB9-B249-76312A6A261E.html

Examples:
::>system configuration backup show
::*> system configuration recovery node restore -backup cluster1.8hour.2011-02-22.18_15_00.7z


If the above procedure does not work:1) Reboot the Node, go to option 3) Maitenance mode.
>aggr status
one by one take each aggregate offline and destory it, all the data aggr first and finally root. (Aggr offline/destroy)
>halt <enter>

2) When system reboots, again go to menu option and try '4' erase thing again.


Good luck!

technicalJBB
6,674 Views

Thanks @Ontapforrum - the aggr offline/destroy option appears to be getting me somwhere - the option 4 command is actually showing progress zeroing the disks in the shelves now. I'll check in again if this still seems weird, and flag your post as a solution if I'm able to bring this system back to the setup state.

Ontapforrum
6,666 Views

Thanks for the update. I am positive it will work. Good luck!

technicalJBB
6,456 Views

Thanks @Ontapforrum - this worked perfectly!

Public