Advice requested for configurating an aggregate layout.

STEVIEOH · ‎2012-01-10

Howdy! First, the situation:

I have just experienced the joy of Promotion-By-Departure. My predecessor apparently experienced a mid-life crisis (or possibly a sudden moment of clarity) and up and quit on us, and now I'm in charge of this project. I, who have absolutely zero experience with NetApp. (My only SAN experience thus far has been with an EMC). No notes, no documentation, no nothing.

I was left with what look like two controller units that say FAS3070 and three disk shelves with 14 disks each (can't find an obvious model designator). The controller units are labeled "NodeA" and "NodeB". The shelves are numbered 1 to 3. The controller units appear to have been (almost) completely unconfigured.

The disks are each 450GB according to the label.

When I brought up the configuration program (the website thing written in Java that runs on 127.0.0.1), it listed 34 disks with labels like '0a.16' and '0c.51'. I assume that '0a' is Shelf 1 and '0c' is Shelf 3 and I need to activate Shelf 2 somehow.

Now, the questions:

(1) Based on what I was able to gather, my understanding is that the NetApp world is built like this:

* Disks are aggregated together to form RAID Groups (RGs) which are just like a regular RAID array.

* RGs are combined together to form Aggregates, just like a Windows "Dynamic Disk" or a an LVM2 "Volume Group"

* (This is where things get iffy) My understanding is that LUNs and whatnot are carved out from an Aggregate, in much the same way that virtual machine "drives" exist as files sitting on a regular filesystem.

Is this accurate?

(2) Do I need to have a "reserved" aggregate for the NetApp itself? For example, the EMC I work with has three disks in a RAID-5 configuration (I think; it might be two disks in RAID-1. My memory's fuzzy) reserved for its own internal use.

(3) Finally, and this is where I'm completely lost: What are some good ways to set this up, and how should I decide which one is best for me?

(Note: I've read ttp://communities.netapp.com/message/2676 so please don't simply say "go look there". I read that thread, but didn't understand the rationale behind most of the assertions.)

Once again, so nobody has to scroll: Two "Nodes" (FAS3070), three shelves with 14x450GB disks each (total 42 disks).

The first concern is how many RAID groups to create. How should I decide this? My first thoughts are reliability and performance.

I can imagine several scenarios, but here are three of them:

If I create one huge 41-disk RG with 1 spare, then I have the following:

* 39 disks worth of space (2 lost due to parity)

* 1 spare disk

* if 3 of those 41 disks fail I'm in for a very bad day

* if 1 or 2 of those 41 disks fail I'm in for a rough day because it has to read the data off of the other disks to regenerate the missing data

* a full-stripe write will necessarily involve all 41 spindles which could theoretically block up reads

If I create three 13-disk RGs with 1 spare each, then I have the following:

* 33 disks worth of space (6 lost due to parity)

* 3 spare disks

* if a disk fails, then only that RG has trouble (right?) Reads that pull from the other two RGs will perform normally

* a full-stripe write hits 13 spindles; reads only have a 1/3rd chance of contending with a write (right?

I suppose the extreme far end would be eight 4-disk RGs with 1 spare each, which gives me

* 16 disks worth of space (16 lost due to parity)

* 10 spare disks

(I don't think I can create a 3-disk RAID-DP. I haven't tried. If I can, however, then that's hilarious.)

Another concern: Should I stuff everything into one big aggregate or create multiple aggregates? What are the pros and cons?

hunne · ‎2012-01-10

Hi Stephen,

Not sure if I should say congratulations on your promotion or not, but I tend to err on the optimistic side, so lets go with that.

It's been a couple of years since I have configured a system so I am perhaps not as up to date as many of the people on this forum from our PS and SE community, however I'll have a go at answering you questions below in-line.

(1) Based on what I was able to gather, my understanding is that the NetApp world is built like this:

* Disks are aggregated together to form RAID Groups (RGs) which are just like a regular RAID array.

Correct - you get two choices - RAID-DP or RAID 4. RAID 4 should probably be considered a "legacy" feature - not too many reasons why you would consider it these days.

* RGs are combined together to form Aggregates, just like a Windows "Dynamic Disk" or a an LVM2 "Volume Group"

Aggregates can be one made up from one or more RAID groups, up to the maximum size determined by your disk count and the controller model. Maximum aggregate/volume size (depending on model and ONTAP version) is 100TB.

* (This is where things get iffy) My understanding is that LUNs and whatnot are carved out from an Aggregate, in much the same way that virtual machine "drives" exist as files sitting on a regular filesystem.

Is this accurate?

The next layer above an aggregate is where we start to configure "containers" that we can begin to actually store data in and can be accessed by users and applications. The first thing you would do after creating an aggregate is create a Flexible Volume or FlexVol. FlexVols are laid out across every single data disk within an aggregate regardless of size, so the general rule is the more data disks you have the better. They can be resized on the fly, so don't worry too much if you make them too big or too small - you can change it very easily and non-disruptively.

You can also create quota tress or qtrees within volumes, which just appear as normal directories to clients when the browse the volume. A simple way to think about qtrees as opposed to a "normal" directory that is created by a client that mounts the volume via CIFS or NFS, is that a qtree is recognised by Data ONTAP (the NetApp OS), and can be manipulated by assigning quotas, configuring replication, exported or shared directly from the storage system.

A directory created from a client has no visibility from within Data ONTAP - all the storage system "sees" is that some inodes have been consumed when a directory is created - there's no way to manipulate it from the storage systems point of view. Qtrees vs directories are neither here as far as a client or application goes functionality wise, but they can be very useful if you want to be able to replicate a single directory (qtree) from within a larger volume using NetApp functionality such as SnapMirror and SnapVault. LUN's are the next logical "containter", and they are basically recognised by ONTAP as a big file. You would only configure LUN if you are going to connect to the storage as a block device - if you want to configure volumes to share out CIFS or NFS data, there's no need to configure a LUN on top a volume.

So, in summary the way storage is configured is as follows:

RAID groups
Aggreate
Volume (can be directly shared/exported via CIFS/NFS. Volumes are also the Snapshot "boundary")
Qtree
LUN

(2) Do I need to have a "reserved" aggregate for the NetApp itself? For example, the EMC I work with has three disks in a RAID-5 configuration (I think; it might be two disks in RAID-1. My memory's fuzzy) reserved for its own internal use.

No. The storage system requires a root volume to store all of the configuration data for the appliance, and this can "live" in an aggregate that contains other volumes as well. It's called vol0 by default and should already be visible on your system.

You can configure your storage system to have a RAID-DP (or RAID-4) aggregate with 3 or 2 disks for the root volume. There's a couple of schools of thought on why you would do it one way or the other. Having your root volume as part of a larger aggregate is much more space efficient, as you are not going to be taking up parity disks and data disks for a minimal amount of configuration data stored in the root volume. That's the up-side. The down side to configuring the system this way is that if you ever had to perform and offline file system check due to corruption, it would take much longer to do and get the system back online than if you had your root volume dedicated to it's own small aggregate. The reason for this is that during a file system check, each block must be examined in what is called "maintenance mode" where the system is offline as far as clients/apps are concerned. If you have a dedicated aggregate of 1 data disk (1 or 2 parity depending on the RAID type), this will complete much faster than if your root volume is on an aggregate of many disks, as it will need to check each block of every disk prior to completion - so the system will potentially be offline for a LOT longer. This type of operation is quite rare - not trying to scare you here, rather explain the pro's and cons of each approach.

The system will reserve space for the filesystem (WAFL) and snapshots (5% of each aggregate, and 20% of each volume by default). The snapshot reserves are configurable

(3) Finally, and this is where I'm completely lost: What are some good ways to set this up, and how should I decide which one is best for me?

Well, it depends .

Without knowing what you are planning on using the storage for, there's a few general "rules of thumb" to follow.

Make your aggregates as big as possible - the more data disks in the aggregate, the more IOPS available to the volumes in the aggregate.
The default RAID group sizes are generally the ones to stick with, as they are the numbers we have come up with over many years of collecting performance and resiliency statistics that will give you the best trade-off between performance and protection. I think it's 16 disks for FC and SAS drives (2 parity 14 data) and 14 for SATA drives (2 parity, 12 data), but it's been a while...Sometimes it makes sense to increase the RG size for capacity reasons when you are only a few disks over the default of 16, but performance will taper off. I don't think you can increase the RG size beyond about 28...again I am a bit foggy on the actual number.
RAID groups and aggregates CANNOT be made smaller, but can be made larger by adding disks. Volumes and LUN's can be changed up and down on the fly, so they are easy to reconfigure if you make something too small or too big.

In general, you should make your aggregates as big as you can. As you have two storage controllers, you will need at least two aggregates (because they each need a root volume).

If you split the storage up evenly between nodes, this would give you 21 disks for each controller. Best practice is to have two spares per controller - so that leaves you with 19 disks for each aggregate. This is one of those cases where you might want to consider increasing the RG size to 19 - otherwise you will need two RAID groups to use all 19 disks, requiring 4 parity drives rather than 2. (If you are going to be adding more disk in the near future, consider using the making two RAID groups (one with 10 disks and one with 9?), then expanding them as you add more disk to the system so you would end up with 16 in each RG at some point)

This configuration would leave you with one aggregate on each controller (2 in total), each with a capacity of about 6.2TB to then configure volumes and LUN's in.

If I create one huge 41-disk RG with 1 spare, then I have the following:

* 39 disks worth of space (2 lost due to parity)

***SNIP***

(I don't think I can create a 3-disk RAID-DP. I haven't tried. If I can, however, then that's hilarious.)

As the max RG size is around 28 disks, AND you need an aggregate for each controller, then scenario 1 doesn't work. Scenario 2 is a little more along the lines of how we can configure the NetApp system, but again you need two aggregates... As for the contention between reads and writes, this can be an issue if the system is undersized for your workload and it gets overwhelmed. All 3 RG's would be involved in the writes, as all volumes within the aggregate are spread out over each RG. Depending on what you are doing, this could be a problem if you overwhelm the box, but the 3070 is quite a capable system and there is a lot of smarts built in to avoid/mitigate and compensate for these kinds of issues. And as for the 3 disk RAID-DP RAID goups, yes, you can, and yes I have seen somebody set a system up like this...

It's possibly worth getting in touch with your local NetApp SE for a bit of a 101 type session to help you go over anything that isn't clear. I think most of what I have typed out above is still accurate, but I haven't been "on the tools" for a couple of years so may not have the latest best practice information - again, talking to your local NetApp people could help out here. I am sure if I have made any glaring errors then I will be corrected in short order in upcoming replies .

Best of luck.

Advice requested for configurating an aggregate layout.

Sign-up for Software Release Notifications!

Join us on Discord