ONTAP Discussions

reallocation for voume on SSD aggregate

JohnChen
4,393 Views

I have AFF8080EX with SSD shelves adn I have 5 shelves.

The Ontap versizon is 9.0

There are two SSD aggregates and one SSD aggregate is on one of controllers and the other is on another controller. 

Each aggregate has 57 disk (3 raid groups and each has 19 disk, 3 spare)

We want to add one SSD shelf to increase SSD aggregate size.

I plan to do for 12 SSD disk will be on one SSD aggregate and the other 12 will be on the other SSD aggregate to increase SSD aggregate size.

Then each aggregate has 69 disk (3 raid groups and each has 23 disk, 3 spare)

 

My question here is 

 

1.Do I need to do the volume reallcoation on volume after added disk to the current SSD aggregate? (FOR spinning disk  - I agree to do the volume reallocation.)

2.How does the new data R/W to the aggregate after added the new disk to the aggregate?

   I was told that the data will be R/W to the new disk in the aggregate until those get even with others and then distribute data to all 69 disk later.   

3.If #2 is correct,then I will see the hotspot later on those new disk, am I right? - After disk added to the aggregate, we are building one database which is very busy on R/W. 

4.Anyting else that I need to do or pay attention after increase aggregate size?

 

 

 

1 ACCEPTED SOLUTION

bobshouseofcards
4,372 Views

Hi John -

 

Good questions. 

 

When an aggregate is substantially in use, adding disks to an aggregate adds proportionally more blocks on the free list from the new disks as compared to free blocks from existing disks.  Item #2 in your post is correct.  As more blocks on the free list are concentrated on the new disks, those disks tend to be selected for new data more than existing disks, and thus tend to be more used.  With spinning disks, this can be an issue as each disks might only have 150 IOPs (or so, assuming 10K disks) to provide to the data load.

 

One mitigating factor is rate of growth.  If your data set is primarily growth (new data), the effect of adding disks is magnified.  If disk use is mostly change with just a little growth, then balance might be achieved quickly, depending on the rate of change as blocks from the older disks start to free up.  We use volume reallocation to speed up the balancing effect of natural data use because a hot spinning disk is a significant choke point.

 

Of course, those are the general rules for spinning disk.  SSD changes the calculus a bit because of the order of magnitude change in IOPS capability.  The same discussion with respect to blocks on the free list apply.  However, a half shelf of SSDs can still potentially outpace the ability of the controller to move data on the SAS links (depending SSD disk type and SAS link speed).  While you would still see a the new disks hotter than average than the original if you tracked individual disk utilization hour by hour, the rate at which balance is achieved naturally is similarly much higher.  Because each SSD disk can still vastly outperform a spinning disk, the effect of a short term hot SSD disk is negligible.

 

The latency and balance effects of spinning disks are why the general practice is to add a fresh aggregate rather than expand an existing one.  Just as SSD effectively removes disk based latency effects from normal operations, it similar upends the general rule on aggregate expansion in my opinion.

 

The only consideration I'd make is to expand your aggregates using full raid groups.  There is some overhead at the controller in managing raid groups that are not the same size (number of disks in each raid group) across the aggregate.  At this scale it's likely not enough to make a difference.  If the controller is not significantly loaded or driving a highly latency sensitive application, you likely won't notice a difference.  But, for the highly sensitive application, a latency change from say 1.1ms to 1.2ms is an 9% change in average performance and could impact that type of application.

 

 

Hope this helps.

 

Bob Greenwald

Senior Systems Engineer | cStor

NCIE SAN ONTAP, Data Protection

 

 

 

Kudos and accepted answers are always appreciated.

 

View solution in original post

2 REPLIES 2

bobshouseofcards
4,373 Views

Hi John -

 

Good questions. 

 

When an aggregate is substantially in use, adding disks to an aggregate adds proportionally more blocks on the free list from the new disks as compared to free blocks from existing disks.  Item #2 in your post is correct.  As more blocks on the free list are concentrated on the new disks, those disks tend to be selected for new data more than existing disks, and thus tend to be more used.  With spinning disks, this can be an issue as each disks might only have 150 IOPs (or so, assuming 10K disks) to provide to the data load.

 

One mitigating factor is rate of growth.  If your data set is primarily growth (new data), the effect of adding disks is magnified.  If disk use is mostly change with just a little growth, then balance might be achieved quickly, depending on the rate of change as blocks from the older disks start to free up.  We use volume reallocation to speed up the balancing effect of natural data use because a hot spinning disk is a significant choke point.

 

Of course, those are the general rules for spinning disk.  SSD changes the calculus a bit because of the order of magnitude change in IOPS capability.  The same discussion with respect to blocks on the free list apply.  However, a half shelf of SSDs can still potentially outpace the ability of the controller to move data on the SAS links (depending SSD disk type and SAS link speed).  While you would still see a the new disks hotter than average than the original if you tracked individual disk utilization hour by hour, the rate at which balance is achieved naturally is similarly much higher.  Because each SSD disk can still vastly outperform a spinning disk, the effect of a short term hot SSD disk is negligible.

 

The latency and balance effects of spinning disks are why the general practice is to add a fresh aggregate rather than expand an existing one.  Just as SSD effectively removes disk based latency effects from normal operations, it similar upends the general rule on aggregate expansion in my opinion.

 

The only consideration I'd make is to expand your aggregates using full raid groups.  There is some overhead at the controller in managing raid groups that are not the same size (number of disks in each raid group) across the aggregate.  At this scale it's likely not enough to make a difference.  If the controller is not significantly loaded or driving a highly latency sensitive application, you likely won't notice a difference.  But, for the highly sensitive application, a latency change from say 1.1ms to 1.2ms is an 9% change in average performance and could impact that type of application.

 

 

Hope this helps.

 

Bob Greenwald

Senior Systems Engineer | cStor

NCIE SAN ONTAP, Data Protection

 

 

 

Kudos and accepted answers are always appreciated.

 

J-L-B
4,331 Views
JohnChen
I recently added SSDs to some AFF8080EXs also. 1 aggr on each node like you. I was seeing that my first raid group was more heavily utilized than the new raid group. Run statit -b then statit -e on that node to compare the aggr raid groups activity/performance Recommendation from support was to run reallocate start -A aggr_SSD(or whatever your aggr name is) from the node shell. That reallocated physically all the blocks in the aggr. Without doing each volume. It reallocates the blocks then redirects the volumes. Then finishes. I used ::>node run -node clus1-01 reallocate start -A -o aggr_SSD. The -A specifies aggregate and -o specifies run once. Otherwise it defaults to daily interval. Then same command with reallocate status to see the progress. Be cautious if your aggrs and node are over taxed. I am running 8.3.2 but think the command is the same in 9. Call support and open a quick case and they will verify.

Also check out NABox with Grafana. There's a large thread in the communities about it. Once running and collecting there is a canned disk performance page where you can see your cluster-aggregate-raid group utilization in a clocked line graph and compare all of them. https://nabox.tynsoe.org/
Public