Re: For one and quarter SSD shelf, how to create aggregates

netappmagic · ‎2018-10-18

We have a full shelf and addtional of a quarter shelf SSD's, total of 30 SSD's, 15TB each, with root-data-data partitions already in the first full shelf , how do I create a aggregate on two repspective HA nodes? Please help!

bobshouseofcards · ‎2018-10-19

The short answer is just add disks from the second shelf as you normally would. Up to the first two shelves can be Root-Data-Data partitioned. ONTAP will partition the new disks on add and assign the appropriate partition to the first node, then you repeat for an aggregate on the 2nd node.

There are some details - this KB article has more details:

https://kb.netapp.com/app/answers/answer_view/a_id/1029912/

Hope this helps.

Bob Greenwald

Senior Systems Engineer | cStor

NCIE SAN ONTAP, Data Protection

netappmagic · ‎2018-10-24

Nice to see you here, Bob!

For total of 30 SSD's, would one of each P2, and P3(DATA) for spare on each node be enough? which means on each node, there will be one P2, and one P3 for spares on each node.

bobshouseofcards · ‎2018-10-25

Yes. One P2 and one P3 for the data partitions are enough. You will want two for the root partitions per node, of which one can come from the same disk as the shared disk for P2/3s.

heightsnj · ‎2018-10-25

Sorr, still have following follow-ups:

1) Can I assign all P2 to one node, then all P3 to the other? or should I split all P2/P3's to a different node?

2) You are saying just one P2 spare, and one P3 spare for both nodes in the entire HA?

3) after I add/cabled shelf into the stack on AFF, will partitioning automatically done for me?

bobshouseofcards · ‎2018-10-27

No worries! And apologies in advance for length.

A bit of background on the topic. When first released for AFF8080, Advanced Disk Partition was Root-Data style only. So for a full shelf of disks, half the disks were fully assigned to node A and the other half to node B of any HA pair. Additional disks were full disks assigned in the usual way.

When ADP was extended to Root-Data-Data style, the factory disk assignment remained the same. A single shelf was assigned half of each disk to one node with both data partitions going to the same node. Even if you had two shelves, the pattern was repeated for the second shelf, since now up to two shelves (well, technically 48 disks) of ADP root-data-data is supported.

This standard pattern changed around ONTAP 9.2 deployment. It turns out that performance is less driven by the number of partitions but rather by the number of physical end points the node can get to respond to an IO request. For a two shelf system, for example and ignoring spares for convenience, if both data partitions are assigned to the same node, each node has 24 physical SSD endpoints and 48 logical partition endpoints. But, if all the P2s go to node A and all the P3s go to node B, each node would have both 48 physical SSD endpoints and 48 logical partition endpoints. Same data size, better performance by far.

To appreciate the difference - I had an A700 that shipped with the old configuration. The standard rating on 1.5 shelves of disk was about 20000 32K IOPs from the performance modeler per controller. Switching the same setup to the new partition configuration modled 40000 32K IOPS. Yup - double the expected potential performance by aligning the partitions with all P2s to node A and all P3s to node B.

[ Side note - for anyone concerned about "low" IOPs numbers, recall the block size. Per controller, 1.5 shelves was estimated originally at 20000 32K iops - so 40000 32K iops for the HA pair. While you can't exactly multiple by 8, if this were 4K iops youw are talking in the 250K+ range IOPs for the system. Realigning the partitions put it in the 500K+ range for 4K iops sustained across all workloads. ]

[ 2nd side note - for anyone who claims other systems go faster for similar disk, recall that ONTAP is designed for multiple workloads in parallel as its sweet spot. It goes faster as you increase the workload count. Systems optimized for single workloads - and you know who they are by the types of benchmarks they publish - will beat AFF and ONTAP on that single workload for sure. But then, if you will only ever run a single workload on an AFF, you looking at the wrong infrastructure. Look toward an EF-series instead and recheck that comparison. ]

Ok - enough digression. So the short answer to your first question is yes, you should assign all the P2s to one node and all the P3s to the second node.

For your 2nd question - two aspects of it. Technically, when you have shared disks, only 1 spare disk is needed between the two nodes to cover the shared disks. For partitioned disks, it will be pre-partitioned so also have a shared container-type in a "storage disk show" command - more on that below. If you also add non-partitioned disks, say shelves 3 and 4, then a traditional spare is needed for those disks just like always. This is just like in a storage pool. A set of shared disks need only one "shared spare". Unshared disks need their own separate spare like usual. That covers the minimal need.

Logistically, you may want more than one spare of any given type. This aspect is governed by how fast you can replace a disk. If you only buy Next Business Day service, for example, and a disk fails on a Friday, you might not have a replacement until Monday. Do you want to run with no spare on an HA pair for the whole weekend? Then again, considering the investment in infrastructure, would you really run your shiny new AFF on NBD support or would you opt for 4-hour (or faster) or perhaps even have 1 replacement disk onsite depending on capacity? Remember that it's still RAID-DP, so you would have to lose at least 3 physical disks in the same raid group before getting one fully replaced. Given SSD reliability compared to spinning, odds are 1 shared spare is enough to satisfy the failed disk scenario.

But to reiterate - the one shared spare only applies to the P2 and P3 partitions. In a 2-disk system, you can have two raid groups per node for the data aggregates - one with 23 partitions and one with 24 partitions. The last disk disk is P2/P3 spare for both. You cannot do that with the root aggregate - the P1's. These need two spares per node for proper protection. Raid group size 22 is appropriate. The P1 on the "shared spare" can be one of the four required P1 spare partitions. The rest come from other P1s.

For your third question - assumption is that you have at least one shelf where all the disks on that shelf are partitioned, whether it's a full shelf or not. You then add disks either by filling that shelf or adding a shelf. The new disks are not partitioned automatically. However, if you add a "disk" to the existing data aggregates, and the raid group to which the "disk" would be added contains partitions, the disk you are about to add will automatically be partitioned and only one partition from the added disk will get added to the aggregate. If by adding this disk you now have no shared spare disk, ONTAP will automatically partition a regular SSD to act as the new shared spare.

This expansion can get tricky. Beware of which raid group will get the new disk, especially as you build out to two shelves as the actual number of disks in the data aggregate raid groups are not consistent. Also beware of your raid group size so you don't accidentally create a raid group you didn't mean to.

Also, much like how disks can be manually replaced through ONTAP commands, partitions can also be manually replaced one for another. I used this mechanism one partition at a time to convert a live system in the old style partition assignment to the new style partition assignment to get the benefit of higher performance. One partition at a time - elapsed took about 2.5 weeks, but granted I wasn't working it 24 hours a day either. Doable, albeit very tedious.

Hope this helps you.

Bob Greenwald
Senior Systems Engineer | cStor
NCSE | NCIE SAN ONTAP, Data Protection

Kudos and/or accepted solutions are always appreciated.