Can a spare disk be picked up by the other node

heightsnj · ‎2018-02-28

Assuming I have two nodes HA cluster, each node has an aggregate, and also owns a spare disk, if one spare is being already used, and the second disk failed, can the other spare owned by the other node be automatically picked up for 2nd failure?

I think the answer is yes, but just to make sure with you.

Vmtech · ‎2018-02-28

Hello,

disks are owned by individual nodes (nodes are part of HA).

If the failed disk is owned by NodeA then it will use the spare disk assigned to that node (conditions apply).

Can you please check the ownership of the second spare disk that you have ? Is it owned by node A or node B ? Use the command aggr status -s or disk show -v <disk name>

heightsnj · ‎2018-02-28

OK. So, the answer is NO, the node can not use the spare one which is owned by the other node.

2nd question: Can a spare one be picked up by any one of aggregtes on the same node should one disk in an aggr failed? assuming disk characterstics are all the same.

Vmtech · ‎2018-02-28

So, you can use the spare disk from the other node. To do that, you will need to assign ownership of the spare disk from NodeB to Node A. This also means that there are no spares disks available for NodeB.

2nd question : Yes, you are right.

heightsnj · ‎2018-02-28

Here it comes a touger question, I didn't intend to ask you at beignning.

I have two AFF nodes in the cluster, and each node already has 28 ssd disks(3.8TB) filling up the first raid-group in the aggr. Now, if I am adding one more ssd shelf with 24 disks. I can do:
1. add 12 disks to form 2nd raid-group, in the existing aggr on each node, 4 disks will be used for parity..

2. form a new aggr/raid-group with 24 disks on one of two nodes, 2 disks will be used for parity.

No extra spares needed, because each node already has its own spare. Option 2 will have about 6 TB more than opetion 1.

Which opiton would you go with? I probably would go option 1, only becuase it will balance well, althrough it would loss about 6TB usable space. The only thing is if there are any benifits to leave the entire shelf belong to an aggr, not split it to two different aggrs?

Vmtech · ‎2018-02-28

hmm... Let me ponder over it.

What version of ONTAP are you running ?

Can you please paste the output of 'aggr show <aggr name> -instance'

heightsnj · ‎2018-03-01

9.1p5

what information else about the aggr you would like to know?

Using the new whole shelf of ssd to create a new aggr ,or add them into existing aggr, that is the question.

AlexDawson · ‎2018-03-01

Either option is valid.

With SSDs, raidgroup sizing doesn't have to be as uniform as spinning disk, due to latency differences between raidgroups due to different utilization not being as much of an issue. We still want to avoid "hot spindles", but it's a much higher threshhold

Is controller CPU utilization about equal? if so - add to both
Is one controller more heavily utilized? if so - add to the other
Is there a uniform need for more capacity? if so - add to both
Is there a targetted need for more capacity? if so - add to that controller.
Do you anticipate budget in the remaining lifespan of the system for another expansion? If so, add to one this time, the other next time. Even if not, consider only adding to one this time to get best capacity.

heightsnj · ‎2018-03-02

Thanks for such details.

Just to make sure, is it true that we only need one spare on each node, and no need to add any more even after add this new shelf? Including this new shelf, we will have total of 3 and half shelve for this AFF HA pair.

What would be the ratio of how many ssd shares should be configured for how many total of SSD's or shelves?

AlexDawson · ‎2018-03-02

It's up to every admin to decide what they're comfortable with - my usual recommendation is 1+1% per type per controller - so if you have 24 disks, or 72 - 2 spares per controllers are fine, at 144 drives, you probably want 3. But if you're physically close enough to the system to replace drives as soon as they come in, maybe you're happy with 1 per controller, or if it takes 4 weeks to get replacement drives to it because it's on a ship in the middle of the ocean, maybe you want 6.

heightsnj · ‎2018-03-30

AlexDawson

In ADP and root-data partition case, a hot spare could be a hot spare disk partition, and counted as one of how many spares should be defined. Correct?

Though we can use a disk as a hot spare for partitioned disk to replace the partitioned disk should one of partition gets failed. But, usually, for partitioned disk, we should use root or data partiion for the hot spare, and for unpartitioned data disk, we should use a unpartitioned disk for the hot spare.Correct?

AlexDawson · ‎2018-04-02

Yes - in theory only one partition may fail, but in practice it's probably both/all partitions on a disk that fail. After that, the root partition may be rebuilt onto different disks to the data partitions.

On systems with mixed partitioning, a partitioned drive will not automatically take the place of an unpartitioned drive if it fails.

I hope I have understood your question correctly.

heightsnj · ‎2018-04-04

Thanks for your patience!

One more last question.

Suppose initially I have two shelve of SSD, an aggr on each AFF nodes, and root-data partitioned.

My question, for any following shelf expansions, should I keep all of them partitioned, or should I start to configure them as unpartitioned disks/rg's?