Solved: Move Root Aggregate

TMADOCTHOMAS · ‎2020-11-19

Hello,

I have to move the root aggregate on a cluster node to different disks, and I have a couple questions for anyone who's done this before.

Background: we have a 4-node AFF cluster where support is ending on two nodes on 1/31. We are going to migrate all volumes to the remaining 2 nodes and move the shelves over after depopulating them. There is currently one shelf per node, all the same size, except for a single tiny additional shelf attached to the nodes we are depopulating. I'm going to use this as my "lifeboat" to move the root aggregate to from the first shelf I depopulate so the node continues to function after removing the shelf. We are running OnTAP 9.5P14.

My plan is to run the "system node migrate-root" command, however I can't find documentation indicating how this process works. At what point is there a reboot? How long does the process take? Does anyone have insight into this?

Also, I would like to use the minimal amount of disks, as I may need the rest of this small shelf to provide buffer space during later migrations, so I am hoping to only use three disks in the disklist parameter. Does anyone know if that will work in OnTAP 9.5? Do I have to use 5 disks? We're going to decom the shelf after this is all over so I'm not concerned with long-term.

If there's some way to partition the shelf and migrate the aggregate from an ADP to an ADP configuration, that would be even better. Would love to hear any suggestions or comments!

TMACMD · ‎2020-11-19

The process uses aggregate relocation. As far as the process goes after starting the command, all the data aggrs are relocated to the node not having the root change. This way the node being changed may reboot freely without impact. The cluster is temporarily disabled. A number of node reboots occur to provision, configure and restore to the new root aggr. At the end, the data aggregates are sent back to the original node. If I recall, the node reboots 3-4 times and the process takes somewhere between 20-40 minutes per controller.

If you have access to NetApp support, check out the tools page and the the Hands On Labs. Try the ONTAP 9.7 lab. After you setup the nodes and the cluster, you can actually try out the new root there using disks. My suggestion: simply create the base cluster and no data aggregates. you will need the disks to create a new root.

Logging into the "serial port" (mentioned in the doc for the lab) allows you to watch the process.

Honestly, the first time is a little scary. You just need to let the process run to completion and do not disturb it!

View solution in original post

SpindleNinja · ‎2020-11-19

I've done it a few times. There's a pretty good KB out there that talks about it: and I believe it answers all your questions too.

It'll reboot a couple of times too. Time... depends.

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/How_to_non-disruptively_migrate_a_node's_root_aggregate_onto_new_disks...

TMADOCTHOMAS · ‎2020-11-19

Thank you @SpindleNinja ! I've read that KB actually, and it is helpful, but it doesn't speak to whether or not you can configure three disks for the aggregate (which I hope is the case). Even though it describes in general what happens (i.e. creating new aggregate, backing up/restoring config, etc.) it seems a little vague on when / how the failover occurs. I just like to have as much advance knowledge as possible regarding what to expect so I can schedule an appropriate outage window for our CIFS Servers.

SpindleNinja · ‎2020-11-19

It mentions to note the min aggr root size via HWU and the that the min disk count based on raid type.

If I recall correctly (I was on the partner PS side of the world when I did it last).. it will do an ARL of the aggrs to the partner node and will do 2 reboots. You can also move your NAS lifs over before hand.

TMADOCTHOMAS · ‎2020-11-19

Thanks @SpindleNinja . I have a hard time navigating Hardware Universe ... I couldn't specifically figure out how to determine the number of disks in a non-partitioned root aggregate is supported. What is an ARL?

SpindleNinja · ‎2020-11-19

Look at min root size for the current and future version of and go from there and factor in RAID-DP.

Example. FAS9000 says min root aggr is is 1TiB and min root vol is 962TB. Technically.. you could get away with a 3 disk x 1.8TB SAS drive aggr.

ARL = aggr relocation. Moves the aggr ownership to the partner.

Can you also verify what would be moving to what? You want to take ADP'd root aggr and move them to non-ADP root aggrs?

I recommend opening a proactive case though to make sure you dot all your i's and cross all your t's.

TMADOCTHOMAS · ‎2020-11-19

Thanks @SpindleNinja . I think I have a better idea. Minimum size is less than the usable storage on one disk, so I think 3 disks will work. I guess we'll see! Opening a case is probably a good idea, just wasn't sure if you could open a case just to vet a plan.

SpindleNinja · ‎2020-11-19

I often did open pro-active cases when needed. Vetting stuff you can also check with your partner or NetApp SE.

TMACMD · ‎2020-11-19

The process uses aggregate relocation. As far as the process goes after starting the command, all the data aggrs are relocated to the node not having the root change. This way the node being changed may reboot freely without impact. The cluster is temporarily disabled. A number of node reboots occur to provision, configure and restore to the new root aggr. At the end, the data aggregates are sent back to the original node. If I recall, the node reboots 3-4 times and the process takes somewhere between 20-40 minutes per controller.

If you have access to NetApp support, check out the tools page and the the Hands On Labs. Try the ONTAP 9.7 lab. After you setup the nodes and the cluster, you can actually try out the new root there using disks. My suggestion: simply create the base cluster and no data aggregates. you will need the disks to create a new root.

Logging into the "serial port" (mentioned in the doc for the lab) allows you to watch the process.

Honestly, the first time is a little scary. You just need to let the process run to completion and do not disturb it!

TMADOCTHOMAS · ‎2020-11-20

Very helpful @TMACMD ! Thank you. I will actually have all data aggregates decommissioned on the node I'm doing this on by the time I run the command, so that should make it faster. I am concerned about this one comment:

----------------------

The cluster is temporarily disabled.

----------------------

Can you elaborate? I would assume you don't mean the entire cluster is literally disabled, maybe just the HA functionality on the two affected nodes? To clarify, this is a 4 node cluster. I don't want the entire remaining 3 nodes to be un-clustered during this!

The Hands on Lab is a great idea. I may try that. Thank you again!

TMACMD · ‎2020-11-20

Oh you are so right! My bad... bad verbiage

the SFO between the ha-pair is disabled not the cluster!

TMADOCTHOMAS · ‎2020-11-20

Thanks @TMACMD ! #whew#

I appreciate the insights into how the process looks. That really helps a lot!