ONTAP Hardware

RAID rebuid time for 144GB disks with RG size 27

naing_lin
9,467 Views

Please let me ask a question here. There was a disk failed in the Filer and spare low. The reconstruction time was long about 7 hours and performance impact while rebuild. The option raid.recon struct.perf_impact is set to medium. There is a single Raid Group and formed as Raid_DP with 144GBx27 disks within a Raid Group size of 27.

There is one KB (Solution ID# 18300) in NetApp that mentioned about roughly rebuild time for 500GB or 1TB ATA disks is about 10hrs.

May I know how long does it take to complete the RAID reconstruction for 144GB disks with RG size 27, and why does performace was seriously impact?

Please kindly help me an answer and your help is really appreciated.

Thanks and Regards,

Naing

14 REPLIES 14

lwei
9,421 Views

Hi Naing,

Roughly, it should be no more than a couple hours. However, the reconstruction time is also dependant upon the number of back-end loops. What's the controller type and configuration?

Thanks,

Wei

mcope
9,421 Views

The type of disk failure, the storage controller model, and the version of Data ONTAP will also impact rebuild times.

Most disk failures are 'soft' failures where too many blocks are flagged as bad.  In Data ONTAP 7.1 and newer, these failed drives use Rapid RAID Recovery to copy the good blocks to a spare drive.  This significantly speeds up recovery time (up to 4x faster in some cases).

A truly dead drive from hardware failure requires reconstruction of all data from parity.  On smaller systems, especially busy ones, this will take more time because the reconstruction process has to wait for the system load to drop to low - medium in order to not impose additional performance overhead.  Large RAID groups also take longer to calculate parity because there are more blocks to calculate parity for.

naing_lin
9,420 Views

The Controller model is FAS3160 and DS14-MK2 Shelves. In our case, the reconstruction takes about 7 hours to complete and performance impact. How much average MB/s can we get for reconstruction?

Thanks and Regards.

Naing.

localizedinsanity
9,420 Views

One thing that can affect this is that in RAID-DP (as you have to be running to get to 27 disks in a RG) I have been told that the first disk failure/rebuild is low priority regardless of the  raid.recon struct.perf_impact setting.  This certainly explains some behaviors I have seen on raid rebuilds of RAID-DP configurations.  On the second disk failure in the same RG the priority gets bumped up on both rebuilds to the setting above.

lwei
9,420 Views

Thanks for the info. But, how many back-end loops? That's important to the reconstruction time.

Thanks,

Wei

radek_kubka
9,420 Views

Hi Wei,

I've been watching this thread silently for a while.

How do back-end loops impact reconstruction time though? Is it so throughput-heavy? And do you mean loops to the affected RAID group (unlikely there is more than one) or all loops on the system?

Regards,

Radek

lwei
9,420 Views

Hi Radek,

The back-end FC-AL loops play an important role in the reconstruction time and application performance while rebuilding. Let's assume all 27 disks are on one 4Gbps loop (unlikely but let's use it as an example). They have to share the loop's bandwidth, 400MB/s. So, that'll be ~14MB/s per disk on average. To rebuild a 144GB disk, it'll take about 10,000 seconds, or roughly 2.8 hours. But that's full speed. When you have applications running and throttle down the rebuild speed, it'll probably take much longer. If you add another loop, you can speed up things significantly, though I wouldn't say doubling the speed.

Regards,

Wei

rkaramchedu1
9,420 Views

Wei

You mentioned:

They have to share the loop's bandwidth, 400MB/s

Isn't the FC loop - 4Gbps ~ 500MB/s? or is my understanding incorrect?

Thx

lwei
8,128 Views

It's just a rough estimate. I'm usually on the conservative side. However, if you can get 500MB/s, more power to you    -Wei

lionetti
8,128 Views

Actually you can only get 400MB per second because 4Gb/s FC uses 8b10b encoding which mean that you divide by 10 instead of by 8 when converting bits to bytes. So the speed of 4Gb/s FC (actually 4.25Ghz speed) is actually closer to 400MB/s instead of 500MB/s/

lwei
8,128 Views

Good point. Thanks Chris.   -Wei

naing_lin
9,420 Views

Hello,

   Lets say If would like to reduce the impact on reconstruction? how many choices do I have rather than below:

1). Create another Aggr with Raid Group sizes of 16 (optimal RAID size) and copy the data there. That needs migration.

2). Set the reconstruction speed to low. That will longer the rebuild time.

   Please let me know is there any other work around or choices?

Best Regards,

Naing

naing_lin
8,129 Views

Hello,

Sorry for late reply.

Yes! Those above mentions are very helpful to me. I congratz all of the replies especially Wei's explanation. Thank you all!

Best Regards,

Naing.

lwei
8,129 Views

Hi Naing, you are very welcome! Thanks,   -Wei

Public