Re: ReCreate 64 bit aggregate

kumaraysun · ‎2015-06-25

HI,

Currently we have 96 disks ( SAS Disks)on each of the Netapp controller, of which 3 are spares. we are left with 93 disks and we created 3 aggregates each with 31 disks. we have two raid groups each of 14+2 and 13+2 Disks on each aggregate ( 6 raid groups in total). This was created when 16TB was the limit for each aggregate( 32 bit). I am planning to recreate it as a single aggregate ( 64 bit) of 5 raid groups each of 16+2 ( 4 raidgroups) and one raid group with 17+2 disks ( 91 disks) and 2 spares for that aggregate. from which i can save 3 disks in total and with better throughput, can someone suggest if this is a suggested way of doing it? the filer is on 7 mode and not cluster mode.

JGPSHNTAP · ‎2015-06-25

The assumption is that you have moved all the data off of your current setup.

96 disks, is 4 shelves... Are you planning on having a separate aggr0 or rolling in aggr0. I tend to prefer separate aggr0, but that's pure preference at this point.

Also, are those 96 disks on each controller? on an HA pair?

kumaraysun · ‎2015-06-25

thanks for the reply,

We have not moved all the data, but we have another 3140 which can act like a standby.

Vol 0 is on the same aggregate, all the 96 disks are for single controller, we have 192 disks in total.

JGPSHNTAP · ‎2015-06-25

Ok, i'm a fan of even raid groups.. i'm willing to give up some spares to make this open...

But I also like to break out vol0 as i said earlier.

18 * 5 = 90 disks

3disk aggr0

3 spares

That's my preference.

kumaraysun · ‎2015-06-25

thanks again,

Your solution is great, but with your suggestion, i do not see any much of space increase in the usable capacity. My requirement here is to get additional space without any hardware upgrade.

bobshouseofcards · ‎2015-06-25

You said you have 450GB disks? Allowing for rightsize/useable factors, even your original suggested config which gets you 3 more disks only gets you a little over 1TB additional potential space in the layout.

How much additional space were you hoping to get through layout changes?

I'd go along with the previous suggestion and take those three disks and separate out an aggregate for vol0 - in 7-mode it isn't required, but I can point to specific operational and performance examples I've encountered where it makes or would have made a difference. With the new layout, you get that for free in essence.

kumaraysun · ‎2015-06-25

many thanks for your reply.

You are right its 1 TB per controller, which i would be getting 2 TB in total on the filer. We never had seperate aggregate for vol0, could you show some pointers if creating aggregates this way would create any performance problems for me? or if there is any other way where i can get more free space?

bobshouseofcards · ‎2015-06-26

Creating a dedicated root aggregate will not create performance problems for you. It will actually improve your daily performance a little bit. At the scale of your filers we are talking fractions of percentage points here, but the improvement is there by not combining the various root volume I/O load with data only aggregates. This effect is most pronounced as both capacity of the disks and number of disks increases, and especially pronounced if you start using cDot instead of 7-mode at any point. In fact, on cDot a dedicated root aggregate for each node is a requirement instead of an option, and while in those cases a three disk root aggregate is sufficient, on high capacity disk systems I actually use 5 disks for my root which makes a noticable performance difference - but I digress a little.

The biggest reason for a dedicated root aggregate on 7-mode systems, in my mind, is operational. Consider the case when you have a data/operational issue that causes corruption on an aggregate. The more stuff you throw at an aggregate - data loads, dedupe, snapshots, whatever - the greater the footprint that exists that you will encounter a bug or issue that causes corruption. That's just simple math - more disks, more load, more features, more potential for a problem. And despite NetApp's (or anyone's) best efforts, such bugs do creep into the mix from time to time, especially as releases evolve. We've all seen them at some point.

So - you make the biggest single aggregate you can, then you put your "boot" volume on that aggregate. Something in your system/operation/data mix causes corruption in the aggregate. You go to restart your Filer - doesn't boot because the aggregate won't come online. Go to failover to the other head - can't, because the aggregate won't come online. Now you are in a lot of manual maintenance mode support time operations to try and do something, anything to recover with more limited access to system logs and histories to what when wrong.

Now - suppose you have 90 odd disks in data aggregates and 3 in a root aggregate. The root aggregate contains nothing more than the root volume. No sharing off the root, no space efficiency, no special features other than a few snapshots for protection of recent changes. If you were to have an issue, which aggregate do you think it would hit? Now, granted, if you only have the root and one other aggregate, if you had a problem that hit the big aggregate you're still pretty much offline. But, now you can still boot/failover. You have full and easy access to logs to diagnose the issue. You can get to any operational mode so the entire set of corrective actions that might be needed are available to you, even if the same amount of data is offline. You can push out firmware or other fixes if needed. And if you ever had more than one aggregate per controller, the other aggregates and their associated volumes are still or at least could be online while you work on the failed one. Additionally, if you ever have an issue with the root aggregate - you have a few spares that you can use to regenerate a completely new root aggregate and volume without worrying about any of the other data volumes at all. Granted you could generate a new root with a single aggregate as well, but it's just cleaner to have the separation.

Some will argue that the space "wasted" by that three disk dedicated root is too high a cost - for example, consider a controller where all the disks might be 4TB or more. That's 12TB physical capacity when you only need 200-300GB or so and it's expensive so why waste it? I counter with the argument that if it was important enough to purchase high end enterprise class storage to begin with, then the nickel and dime approach to a couple of disks is questionable. On one of NetApp's smaller systems I can see the concern, and NetApp has addressed that in current DoT, but when you are in the 200 disk range the argument loses validity in my opinion.

As for eeking out that last bit of space, I am for doing whatever works for you. For example, on my systems I plan for NL-SAS/MSATA high capacity disks at a raid group size of 20 (max for the type). The storage is designed to just house big data, so that maximizes useable space in the system. As I expand shelves I'll hold extra spares as needed so I can build out full size raid groups on the next expansion - because there always is one. For performance disks, I'll go higher than "standard" in raid group sizes as well.

There is a trade off for larger raid group sizes. First, the larger the raid group size, the longer the disk rebuild time when a disk fails. Also, at a certain point, larger raid groups sizes do impact regular operational performance. NetApp achieves best performance by reading and writing full raid stripes across an entire raid group. At a certain RG size performance will hit an inflection point and start to decrease. The inflection point varies based on controller model, disk type, and load patterns. The only way to find the performance optimal RG size for your configuration is to actually test it out yourself, which is rather difficult for most people to do of course. Also, fewer raid groups can affect performance as well as the raid groups get busier as load increases.

For 96 disks per controller, the recomendation was 3 for Aggr0, 5x18 Aggr1, with 3 spare. That config yields 80 data disks (5*16 data disks per raid group). That's a really good design for your system in my opinion. Another way to get 80 data disks is to go with 4 raid groups of size 22 each. That yields 80 data disks (4*20 per rg), uses 88 disks for Aggr1, still has 3 disks for Aggr0, and leaves 5 spare disks. Same space, so what would be the differences?

Well - fewer raid groups means potentially lower performance at the upper bounds due to contention for one less raid group. Larger raid group size means increased disk rebuild time and more affect on performance within the raid group overall due to the rebuild. With more disks, the statistical percentage chance of multiple failures impacting a single raid group is higher , though we are talking in the fractions here of course.

So why might you use a larger raid group now that doesn't have any real advantages now? How about your next disk purchase? If you have a rough idea of how many shelves/disks you might expand this system with the next time you buy storage, which raid group size makes more sense when you make that expansion? On of the two numbers (18/22) might work better going forward. Then again, from what you describe and given the age of the system there is likely not going to be a storage expansion on this box, in which case rg size 18 makes a lot of sense for now.

If you are dead set on maximum space, raid group size of 23 (4 raid groups, 84 data disks, 8 parity disks, 4 spares) will get you that but requires a single aggregate which runs right up against both the potential aggregate and large raid group size concerns listed above. I'd stay with the 5x18 myself, with the dedicated root aggregate.

You are at the limits of what you will get out of this particular system without adding physical space. I hope that this post helps to explain some of the design considerations that go into a good NetApp system design.

kumaraysun · ‎2015-06-26

Many thanks for the elaborate explanation.

kumaraysun · ‎2015-06-26

May be you could answer this here, i also have plan to add additional capacity say for example 10TB, in that case the latest disks which is avialable from netapp is only 900 GB disks whereas my exisiting setup has all 450GB disks. Is it wise to create a new aggregate for the additional capapcity or its okay to add the addional disks to the exisiting aggregate with a mix of 450 and 900 Gb disks. if i actually create new aggregate for the disks which i am adding, i would be wasting disks for parity and for spares. if i mix with the existing aggregate i really dont know if i will be utilizing the compleete capcity of the disk which is 900 GB or only if 450gb of the 900gb would be available. also should i add the 12-15 disks to one conrtroller ( in that case i will create a imbalance to my controller) may be someone can give some pointers.

My setup has a strict budget contraint with no compromise on the cost and also the performance.

bobshouseofcards · ‎2015-06-26

Always use the same size disks within an aggregate. So if you are getting a new disk size, start with a new aggregate for those disks. It's possible to mix, but it's not a good idea. At the very least you will create "zones" of performance imbalances that you won't easily be able to measure. Others may have a different opinion, that's mine.

You're not wasting disks for parity and spares by creating a new aggregate. Raid groups can only contain one disk type to begin with (see note below), so at the very least you need to start a new raid group anyway. Also, for each disk type you want spares of that disk type. A larger disk can become a spare for a smaller disk in a pinch, but then the larger disk is limited to the size of the smaller disk it replaces (to match all other disks in that raid group) and you never get that space back short of lots of hassle. So there really isn't any "waste" in the sense you describe. You either create a new raid group in a new aggregate, which has a demand for 2 parity and at least 1 spare. Or you create one raid group with the new disks an existing aggregate, which requires 2 parity and 1 spare. No difference in disks used - clear performance differentiation you know up front using 2 aggregates.

Sounds like you are only getting one additional shelf of 900's? You can split disk ownership of a single shelf between controllers in an HA pair, if you need to balance load exactly across the controllers. However, if you can balance load incoming to the controllers that well - wow. Tell me how!

Personal preference is to let a controller own a full shelf of disks, and balance by using pairs of shelves. Performance from a particular controller is slightly better if one controller owns all the disks within a shelf. Using this style also maximizes space if you are only getting 1 shelf. If you split one shelf between controllers, to use all the disks you now have two raid groups additional and spares on both controllers, which does use more of the available disks.

kumaraysun · ‎2015-06-26

Hi, Could help me with the raid group which i can form with more spindles for the 450 gb and 900gb aggregates with a balance across controller.s

JGPSHNTAP · ‎2015-06-25

Im not quite sure I understand your goal.

What size disks do you have?

Limit on 3140 with 8.2.3 is 150TB

kumaraysun · ‎2015-06-25

I have 450 GB disks and the plan is on 3160 and not on 3140

JGPSHNTAP · ‎2015-06-25

Even better.. higher limits.. i'm not seeing the issue.

kumaraysun · ‎2015-06-25

thanks for your suggestion