Solved: HA - Assign Disks to One Node Failover Issues?

Dshrout311 · ‎2020-01-30

This may be a basic question, and I am pretty sure I already know the answer, but I wanted to verify.

A client of mine has a FAS2520 with HA Controllers running 12 - 2.42TB disks (6 per node). Due to the need to increase the capacity of their system they've purchased a new ~~DS4243~~ (typo) DS4246 shelf with 12 - 3.63TB disks.

The shelf has been connected to the the controllers with HA in mind.

I was hoping to accomplish two things for them. Increase the storage size for them as well as add disks to the current aggregates to increase overall speed. I know I can add the new disks to their current aggregates, but because of the size difference that is generally not recommended and overtime it may slow things down.

So my question - is it possible to assign all the new disks to one of the nodes, create a single aggregate using all the disks, migrate the volumes on the old aggregate to the newly created one, then wipe out the old aggregate, assign the disks to a single node, and create a new aggregate using all the 2.42 disks. Will this configuration break HA? If a node goes down will it make the aggregate that was created on that node using all the disks unavailable? Is a better option to replace the drives in the 2520 with the same size drives as the new shelf?

Sorry if this is pretty basic, I'm relatively new to NetApp and am trying to make the best decision for the client.

Thanks,

SpindleNinja · ‎2020-01-31

Thanks for posting. Looks like they are ADPed! and yes I would read up on it. it's a great feature.

Also note that the aggrs will be 11 drives each, you need spares.

Here's what you can do though:

First off, disable disk autoassign.

->storage disk option modify -node * -autoassign off

1. Reassign all the disks in shelf 01 to controller 2 and create an aggr.

-> storage disk removeowner -disk 1.1.x

-> storage disk assign -owner node2

create your new aggr on node 2 (either GUI or CLI)

2. move all volumes off aggr2 if they're any left.

3. delete aggr2 (the partitions will now show as spares)

4. You can re-assign the data partitions to controller 1 by the following:

-> set advanced

-> disk removeowner -disk x.x.x -data true
-> disk assign -disk x.x.x -data true -owner NODE1
-> disk show -fields data-owner, root-owner (to verify that everything is moved over correctly)

5. zero all the spare

-> storage disk zerospares

After the drives are zeroed, feel free to grow the aggr on node1.

Re-enable disk autoassign

->storage disk option modify -node * -autoassign on

View solution in original post

SpindleNinja · ‎2020-01-30

Run Config Advisor to make sure that everything is cabled correctly.

It won't break "HA". The aggrs will just failover, providing correct cabling and ha is enabled.

"create a single aggregate using all the disks" - technically yes. you'll have to have separate RAID Groups. I don't think you'd even need to move things. just add the new disks as a new raidgroup.

Also to note, the DS4243 concerns me at this point because they are coming up on end of life and you're limiting the ONTAP version as well.

Dshrout311 · ‎2020-01-30

I appreciate the response.

Sorry if I wasn't clear, as far as creating a "single aggregate using all the disks" I meant all the 2.42TB disks. Currently the system has two aggregates using 6 disks (6 per node for a total of 12).

With the setup I am hoping for I would have 12 - 3.63TB disks in a single aggregate with all disks being assigned to NODE1 and 12 - 2.42TB disks in a second aggregate with all disk being assigned to NODE2.

I just wanted to make sure if either node failed they wouldn't lose that entire aggregate as well and the other node would pick up the aggregate to keep things running.

As far as the shelf model, I apologize I had a typo - They have a DS4246 not a DS4243.

SpindleNinja · ‎2020-01-30

Ah got ya. I feel a bit better about a DS4246.

"With the setup I am hoping for I would have 12 - 3.63TB disks in a single aggregate with all disks being assigned to NODE1 and 12 - 2.42TB disks in a second aggregate with all disk being assigned to NODE2."

Yes, that's fine. Part of the process of failover is that the aggr is relocated to it's parter and it will continue to serve out data.

Sounds like there is also ADP in play, so you'll have to just reassign the data partition to the partner node, NOT the whole disk.

feel free to post the output of "storage disk show -partition-ownership" and we can verify if there is ADP in play.

Also, what version of ONTAP is running?

paul_stejskal · ‎2020-01-31

Just a thought, if ADP is in play, you could move all the data partitions to node 1 and just move the new shelf to node 2 and leave the root partitions alone. I don't think ADP needs to be blown away as it's actually useful in this kind of scenario.

SpindleNinja · ‎2020-01-31

I don't think he was saying blow ADP away, i just wasn't sure he was aware it was even there. i.e. that each disk probably has a root and data partition in his config.

Dshrout311 · ‎2020-01-31

Sorry for the delay in response.

They are currently on ONTAP 9.7.

See below for the output of the command you requested.

In terms of ADP this is where I definitely have some learning to do.

Again ultimately I am hoping to have two aggregates (12 x 2.42TB and 12 x 3.63TB) Currently 6 - 2.42TB drives are assigned to NODE1 and 6 - 2.42TB drives are assigned to NODE2. Is it possible to assign the drives currently assigned to NODE2 to NODE1 and then add them to the aggregate?

Hart2520CL::> storage disk show -partition-ownership
Disk     Partition Home              Owner             Home ID     Owner ID
-------- --------- ----------------- ----------------- ----------- -----------

Info: This cluster has partitioned disks. To get a complete list of spare disk
      capacity use "storage aggregate show-spare-disks".
1.0.0    Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.1    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.0.2    Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.3    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.0.4    Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.5    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372

Disk     Partition Home              Owner             Home ID     Owner ID
-------- --------- ----------------- ----------------- ----------- -----------
1.0.6    Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.7    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.0.8    Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.9    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.0.10   Container Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Root      Hart2520CL-02     Hart2520CL-02     537033274   537033274
         Data      Hart2520CL-02     Hart2520CL-02     537033274   537033274
1.0.11   Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Root      Hart2520CL-01     Hart2520CL-01     537033372   537033372
         Data      Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.0    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.1    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372

Disk     Partition Home              Owner             Home ID     Owner ID
-------- --------- ----------------- ----------------- ----------- -----------
1.1.2    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.3    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.4    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.5    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.6    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.7    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.8    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.9    Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.10   Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
1.1.11   Container Hart2520CL-01     Hart2520CL-01     537033372   537033372
24 entries were displayed.

SpindleNinja · ‎2020-01-31

Thanks for posting. Looks like they are ADPed! and yes I would read up on it. it's a great feature.

Also note that the aggrs will be 11 drives each, you need spares.

Here's what you can do though:

First off, disable disk autoassign.

->storage disk option modify -node * -autoassign off

1. Reassign all the disks in shelf 01 to controller 2 and create an aggr.

-> storage disk removeowner -disk 1.1.x

-> storage disk assign -owner node2

create your new aggr on node 2 (either GUI or CLI)

2. move all volumes off aggr2 if they're any left.

3. delete aggr2 (the partitions will now show as spares)

4. You can re-assign the data partitions to controller 1 by the following:

-> set advanced

-> disk removeowner -disk x.x.x -data true
-> disk assign -disk x.x.x -data true -owner NODE1
-> disk show -fields data-owner, root-owner (to verify that everything is moved over correctly)

5. zero all the spare

-> storage disk zerospares

After the drives are zeroed, feel free to grow the aggr on node1.

Re-enable disk autoassign

->storage disk option modify -node * -autoassign on

Dshrout311 · ‎2020-02-03

Thanks for the help and insight.

Just to be sure, these commands are non-destructive correct?

I want to make sure I understand each step of this process, so I hope you don't mind I've added questions/comments on the commands you've given.

"->storage disk option modify -node * -autoassign off"

I understand this as simply turning off disk auto assign. As I understand it, the disks won't automatically get assigned to a particular node.

-> storage disk removeowner -disk 1.1.x

This command is simply removing the current disk owner of the drives in the new shelf that was installed.

-> storage disk assign -owner node2

This will then assign the disks that were removed from the above command to the second node in the cluster. ~~Will not specifying the specific disks just do all available disks to that node, including available disks not on shelf01?~~ - You can disregard this question - I now understand the specific disks or disk list needs to be specified.

2. move all volumes off aggr2 if they're any left.

Simple enough, but to be sure this is moving all the volumes on the original aggregates on NODE2 to the new one I just created on NODE2, correct? There are two SVMs on this device as well, will that have any impact? Finally there are 4 total aggregates, two of which have the ability to move the volume, but the other two I believe are root aggregates and indicate there are no volumes to move when attempting to move anything. I've included a screen shot for further explanation.

3. delete aggr2 (the partitions will now show as spares)

Self-explanatory, deleting the aggregates listed in the screen shot above, but does this include the aggr0 and aggr0_Hart2520CL_02_0?

4. You can re-assign the data partitions to controller 1 by the following:

-> set advanced

Entering advanced mode in CLI

-> disk removeowner -disk x.x.x -data true

Am I correct in assuming I need to specify x.x.x? Where x.x.x would be the disks assigned to NODE2? So as I understand it, this would only be the (6) 2.42TB disks currently assigned to NODE2? For example -disk 1.0.0, 1.0.2, 1.0.4, 1.0.6, 1.0.8, 1.0.10, what does the -data true switch do?

-> disk assign -disk x.x.x -data true -owner NODE1

This is then assigning the disk to NODE1, same as above I assume I need to specify disk x.x.x, also curious what the -data true switch does here.

-> disk show -fields data-owner, root-owner (to verify that everything is moved over correctly)

Self-explanatory as this just verifies the work performed above.

5. zero all the spare

-> storage disk zerospares

This will then zero the spares that are now available after the above commands. Allowing me to assign the disks pulled from NODE2 to the aggregate on NODE1?

Re-enable disk autoassign

->storage disk option modify -node * -autoassign on

Self-explanatory - this is just re-enabling autoassign for all disks across all nodes.

Finally, as I understand it, once this is complete I will then have NODE1 with 12 disks with an aggregate and NODE2 with 12 disks and an aggregate.

Thank you very much for all your help and guidance, I have some experience in storage systems, but NetApp is pretty new to me and I start training soon, so I appreciate all the help you've given.

SpindleNinja · ‎2020-02-03

Just to be sure, these commands are non-destructive correct?

- Honestly, some can be if you're not careful others have "idiot proofing" build in. Just be careful that what you're running commands against doesn't have any live data on it.

I want to make sure I understand each step of this process, so I hope you don't mind I've added questions/comments on the commands you've given.

"->storage disk option modify -node * -autoassign off"

I understand this as simply turning off disk auto assign. As I understand it, the disks won't automatically get assigned to a particular node.

-Nope. this will stop the node from automatically re-assigning the disks/partitions that you are doing in the next steps.

-> storage disk removeowner -disk 1.1.x

This command is simply removing the current disk owner of the drives in the new shelf that was installed.

-Correct, these will be the ones on your external shelf

-> storage disk assign -owner node2

This will then assign the disks that were removed from the above command to the second node in the cluster. Will not specifying the specific disks just do all available disks to that node, including available disks not on shelf01? - You can disregard this question - I now understand the specific disks or disk list needs to be specified.

- I like to specify what disks i'm assigning to what so add a -disk x.x.x flag in there.

2. move all volumes off aggr2 if they're any left.

Simple enough, but to be sure this is moving all the volumes on the original aggregates on NODE2 to the new one I just created on NODE2, correct? There are two SVMs on this device as well, will that have any impact? Finally there are 4 total aggregates, two of which have the ability to move the volume, but the other two I believe are root aggregates and indicate there are no volumes to move when attempting to move anything. I've included a screen shot for further explanation.

- ignore the root aggrs, (the two smaller ones) you can't do anything with them. Just move everything off the aggr you're going to be deleting.

-You will need to move the SVM's root volumes off the aggr as well, SVMs will continue to serve out data.

-Also to note, I would rename your root aggrs to show that they are root. I typically do N1_aggr0_root or root_aggr0_N1 or something like that for node 1.

3. delete aggr2 (the partitions will now show as spares)

Self-explanatory, deleting the aggregates listed in the screen shot above, but does this include the aggr0 and aggr0_Hart2520CL_02_0?

- nope, it'll just delete the data aggr. I think even if you click delete on the root aggr it will say "hey dummy you can't do that"

4. You can re-assign the data partitions to controller 1 by the following:

-> set advanced

Entering advanced mode in CLI

-> disk removeowner -disk x.x.x -data true

Am I correct in assuming I need to specify x.x.x? Where x.x.x would be the disks assigned to NODE2? So as I understand it, this would only be the (6) 2.42TB disks currently assigned to NODE2? - correct For example -disk 1.0.0, 1.0.2, 1.0.4, 1.0.6, 1.0.8, 1.0.10, what does the -data true switch do?

- You will need to specify the disk, I typically do one at a time, just incase I fat finger something.

- With ADP each disk as a "root" partition and a "data" partition (AFF systems have a root-data1-data2 config)

the -data true will specify that you want to remove the owner of just the data part. The root partition and the physical disk container will remain owned to what they are currently owned too.

-> disk assign -disk x.x.x -data true -owner NODE1

This is then assigning the disk to NODE1, same as above I assume I need to specify disk x.x.x, also curious what the -data true switch does here.

- Same as above, you are assigning just the data part to node1. node 2 will still own the small root.

-> disk show -fields data-owner, root-owner (to verify that everything is moved over correctly)

Self-explanatory as this just verifies the work performed above.

5. zero all the spare

-> storage disk zerospares

This will then zero the spares that are now available after the above commands. Allowing me to assign the disks pulled from NODE2 to the aggregate on NODE1?

- This just makes them ready to add to node1's data aggr. If you don't zero them, they will be zeroed during the aggr grow, and will just take longer to actually grow the aggr.

Re-enable disk autoassign

->storage disk option modify -node * -autoassign on

Self-explanatory - this is just re-enabling autoassign for all disks across all nodes.

Finally, as I understand it, once this is complete I will then have NODE1 with 12 disks with an aggregate and NODE2 with 12 disks and an aggregate.

I didn't see any of the steps above to actually create the aggrs, so you will need to create 2 aggrs with all the newly assigned spares.

If you do "storage aggregate show-spare-disks" and "storage disk show -container-type spare" you should see a long of spares.

There should be 24 1.1.x that are assigned to node2. And then the spare partitions on node 1. Can you post the output of the above commands so we can verify before we create the aggr on node 2. and grow the aggr on node 1?

Thank you very much for all your help and guidance, I have some experience in storage systems, but NetApp is pretty new to me and I start training soon, so I appreciate all the help you've given. - No problem.

Dshrout311 · ‎2020-02-03

Thanks for the insight, I'll post the results of those commands once I get to those steps.

I'm in process of moving the volumes to the newly created aggregate (using all 12 3.64TB drives in the new shelf) on NODE2.

One of the volumes is relatively large so I'll assume it will take a couple hours to move. I was able to move the SVM root volume off to the new aggregate as well so once this last volume is complete I should be ready to move forward.

Dshrout311 · ‎2020-02-03

As I am waiting for this final volume to move I was reviewing our posts and I think the last step has me a little confused. I've included a screenshot of the current setup of disks as it is right now. I noticed that the 12-2.43TB disks are listed as "Shared" and the 12-3.64TB disks are listed as "Aggregate" once these steps are complete will the 12-2.43TB disks be Shared or Aggregate?

The sata_data_2 is the last aggregate that lives on NODE2 (other than root aggregate) and the last volume is in progress of moving to the Shelf01Aggr that I created using the 12-3.64TB disk on NODE2. Once that volume is done moving there will no longer be any data on the sata_data_2 aggregate and it will be deleted. I guess where I am getting held up is when I move those disks over to NODE1, will they still be shared?

"There should be 24 1.1.x that are assigned to node2. And then the spare partitions on node 1. Can you post the output of the above commands so we can verify before we create the aggr on node 2. and grow the aggr on node 1? "

That part is a little confusing for me as there are only 12 total disks in the new shelf and they've already been used to create an aggregate on NODE2.

SpindleNinja · ‎2020-02-03

That part is a little confusing for me as there are only 12 total disks in the new shelf and they've already been used to create an aggregate on NODE2.

-Sorry, forgot it was only half populated.

re:shared, they will always be shared because they are ADPed. Check the CLI output to see what is spares.

Dshrout311 · ‎2020-02-04

OK, thanks that makes a lot more sense to me. The volume is moved over to the new aggregate and all aggregates have been deleted from NODE2 other than the root aggregate. Here is the output of the commands you requested.

Hart2520CL::> storage aggregate show-spare-disks

Original Owner: Hart2520CL-01
 Pool0
  Root-Data Partitioned Spares
                                                              Local    Local
                                                               Data     Root Physical
 Disk             Type   Class          RPM Checksum         Usable   Usable     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- -------- --------
 1.0.11           BSAS   capacity      7200 block            2.28TB  143.7GB   2.43TB zeroed

Original Owner: Hart2520CL-02
 Pool0
  Spare Pool

                                                             Usable Physical
 Disk             Type   Class          RPM Checksum           Size     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- --------
 1.1.11           FSAS   capacity      7200 block            3.63TB   3.64TB zeroed

Original Owner: Hart2520CL-02
 Pool0
  Root-Data Partitioned Spares
                                                              Local    Local
                                                               Data     Root Physical
 Disk             Type   Class          RPM Checksum         Usable   Usable     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- -------- --------
 1.0.0            BSAS   capacity      7200 block            2.28TB       0B   2.43TB not zeroed

Original Owner: Hart2520CL-02
 Pool0
  Root-Data Partitioned Spares
                                                              Local    Local
                                                               Data     Root Physical
 Disk             Type   Class          RPM Checksum         Usable   Usable     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- -------- --------
 1.0.2            BSAS   capacity      7200 block            2.28TB       0B   2.43TB not zeroed
 1.0.4            BSAS   capacity      7200 block            2.28TB       0B   2.43TB not zeroed
 1.0.6            BSAS   capacity      7200 block            2.28TB       0B   2.43TB not zeroed
 1.0.8            BSAS   capacity      7200 block            2.28TB       0B   2.43TB not zeroed
 1.0.10           BSAS   capacity      7200 block            2.28TB  143.7GB   2.43TB zeroed
8 entries were displayed.

Hart2520CL::> storage disk show -container-type spare
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------

Info: This cluster has partitioned disks. To get a complete list of spare disk capacity use "storage aggregate show-spare-disks".
1.1.11               3.63TB     1  11 FSAS    spare       Pool0     Hart2520CL-02

Dshrout311 · ‎2020-02-04

Just wanted to give an update I moved forward with the commands you recommended and everything seems to be working well. I was able to add the 6 disks to NODE1 (running now) and I will have two aggregates with 12 disks each (11 with the spare).

Thank you very very much for everything and your guidance. Running through this has helped me tremendously and I definitely understand the system better.

SpindleNinja · ‎2020-02-04

Happy to Help, glad it worked out!

NetApp U has lots of free training (there is also paid training out there as well), but always feel free to read through the docs and TRs. They are also a good for self-education.

Menedenz · ‎2024-02-07

Hi, hope you guys are doing ok

I have a question rather than an answer, a storage is about to arrive AFF-A250, this is my first time setting up a netapp storage.

in my case I'll use FC over brocade SAN switches to connect my servers to storage and want to present two LUNS one for 3 ESXI servers (side question does vcenter rquired more than one lun for HA?) and the other LUN for 2 oracle linux virtualization

I want to assign all disks for one node and leave the other node as standby the aff-a250 contain 24 internal drive, will they be a spare from the factory (not assigned) or will they be assigned two both nodes (using auto-assighn)

now if auto-assign did distribute the 24 drive over node one and two, how can I reassign disk ownership to one node only. ( and based on your experience is it ok to only utilize one node and leave the over as standby?)