ONTAP Discussions

Removing Nodes from Cluster

NEO-BAHAMUT
2,380 Views

Hi All,

 

We have a NetApp cluster,

OLDNode-01
OLDNode-02
OLDNode-03
OLDNode-04

NEWNode-05

NEWNode-06

NewNodes-05 and 06 are our new controllers and I believe I have finished migrating everything from OLDNodes-01-04 over to 05 and 06.

 

We next want to remove Nodes 01 to 04 from our cluster completely. Is there a nice easy way to do this? Is it non-disruptive? Are there any final checks I need to do? Is ONTAP smart enough to know if things could go wrong and tell us beforehand? 

 

Just wondered what your experiences were. If there are things we need to know and basically make sure this runs as smooth as possible without disruption (or a minimum)

 

1 ACCEPTED SOLUTION

TMACMD
2,373 Views

There is a KB someplace....however:

  1. Make sure all aggregates on the nodes to be removed are empty
    1. volumes musst be deleted, not just offline/unmounted
  2. Make sure all DATA lifs (NAS/SA) are removed from the nodes
    1. LIFs must be deleted or migrated
  3. Disable Storage Failover on nodes to be removed
    1. storage failover modify -node OLDNode-01|OLDNode-02|OLDNode-03|OLDNode-04 -enable false
  4. Preemptively chcek for Epsilon
    1. set diag
    2. cluster show
    3. node modify -node old-node-with-epsilon -epsilon false
    4. node modify -node new-node -epsilon true
    5. cluster show (verify the epsilon moved)
  5. Try to remove a node
  6. cluster remove-node -node old-node-01
    1. It should just finish. If there are any issues, it wil let you know
  7. When it finishes, do its HA-Partner next
  8. When it finishes, do the next node
  9. When it finishes, do its HA-Partner last

As an aside, I like to watch the process finish. I open an SSH to each of the nodes service-processors. I kick off "system console" so i may watch the process go. I have had a couple of systems take 9 minutes and timeout. That was fun. Most of the time, they are evicted cleanly in about 3-5 minutes.

 

After you are done, go to each of the 4 service processors and go to the system console. Just hit enter.

You should get a "special boot menu". At this point, choose option 9 to kick off the APD menu on each node.

When all 4 nodes are at the 9a -> ADP menu, kick of 9a on only one node of EACH ha-pair. (so like node 1 and node 3). When it finishes, do the same on nodes 2 and 4. When it finishes, choose the option to go back to the boot menu.

At the boot menu, type "systemshell" and at the prompt type in halt.

The node will drop and stop at the loader.

At this poing you can break back to the BMC and do a "system power off" to shut off the controllers.

 

 

View solution in original post

4 REPLIES 4

TMACMD
2,374 Views

There is a KB someplace....however:

  1. Make sure all aggregates on the nodes to be removed are empty
    1. volumes musst be deleted, not just offline/unmounted
  2. Make sure all DATA lifs (NAS/SA) are removed from the nodes
    1. LIFs must be deleted or migrated
  3. Disable Storage Failover on nodes to be removed
    1. storage failover modify -node OLDNode-01|OLDNode-02|OLDNode-03|OLDNode-04 -enable false
  4. Preemptively chcek for Epsilon
    1. set diag
    2. cluster show
    3. node modify -node old-node-with-epsilon -epsilon false
    4. node modify -node new-node -epsilon true
    5. cluster show (verify the epsilon moved)
  5. Try to remove a node
  6. cluster remove-node -node old-node-01
    1. It should just finish. If there are any issues, it wil let you know
  7. When it finishes, do its HA-Partner next
  8. When it finishes, do the next node
  9. When it finishes, do its HA-Partner last

As an aside, I like to watch the process finish. I open an SSH to each of the nodes service-processors. I kick off "system console" so i may watch the process go. I have had a couple of systems take 9 minutes and timeout. That was fun. Most of the time, they are evicted cleanly in about 3-5 minutes.

 

After you are done, go to each of the 4 service processors and go to the system console. Just hit enter.

You should get a "special boot menu". At this point, choose option 9 to kick off the APD menu on each node.

When all 4 nodes are at the 9a -> ADP menu, kick of 9a on only one node of EACH ha-pair. (so like node 1 and node 3). When it finishes, do the same on nodes 2 and 4. When it finishes, choose the option to go back to the boot menu.

At the boot menu, type "systemshell" and at the prompt type in halt.

The node will drop and stop at the loader.

At this poing you can break back to the BMC and do a "system power off" to shut off the controllers.

 

 

donny_lang
2,354 Views

TMAC's got it covered as usual, but here is the KB that he was referencing: https://docs.netapp.com/us-en/ontap/system-admin/remove-nodes-cluster-concept.html

NEO-BAHAMUT
2,352 Views

Thanks both. Ill give this a look over and let you know how it goes.

NEO-BAHAMUT
1,722 Views

@TMACMD @donny_lang 

Thanks both for your replies. Today I have successfully removed 4 nodes out of our cluster leaving us with two full supported nodes awaiting an ONTAP Upgrade which I will look at next.

 

Had a couple of issues which were easy enough to fix.

1. When going through the pre-reqs before removal of the first two controllers, it detected some FC lifs which we hadn't moved yet. We aren't using FC on our new controllers so just deleted them which fixed that. This allowed us to remove the first two.

 

2. The second issue was duruing pre reqs it detected two MDV_CRS volumes still on both controllers. I did a vol move to our new controllers which also fixed this.

 

After that it removed the nodes with no issues.

Public