ONTAP Hardware
ONTAP Hardware
Hi Folks,
I'm trying to design a storage solution.
A FAS3020 with 3 shelves full with 42x 300GB FC disks.
Default the Raid Group size is 16 (14 + 2 spare) de max raid group size is 28.
Does anyone have some best practice information? The NOW site hasn't much info on that.
Kind regards
Hi Eric,
You’re spot on…it’s a balance between needs of capacity/performance versus risk aversion…
And as you say for the cost of one disk it may not be worth the risk…however for some people especially in the 2020 kind of end of the scale, they need the capacity equally…
But you’re right it’s a matter of opinion really and as you said based on risk assessment…
The lovely flexibility of NetApp ☺
Sadly Paul also based in the UK…it’s the other guy who’s in Cyprus!!!
Hello,
NetApp revised the best practice sparing policy last year to make it more logical and applicable to the range of configurations we see in our customer base. There is only a single configuration in which we recommend using only a single spare drive - and that is a FAS2000 series (aka "entry" level systems) that is using only internal drives (no external storage attached). As some have pointed out already, the number of spares to keep on hand varies depending on what you are concerned about with the configuration. Here is the updated spares policy - the "official" NetApp best practice sparing policy:
------------------------------------------------------------------------------------------------------------------
HOW MANY HOT SPARES SHOULD I KEEP IN MY STORAGE CONFIGURATION?
Recommendations for spares vary by configuration and situation. In the past, NetApp has based spares recommendations strictly on the number of drives attached to a system. This is certainly an important factor, but it's not the only consideration. NetApp storage systems are deployed in a wide range of configurations. This warrants defining more than a single approach to determining the appropriate number of spares to maintain in your storage configuration.
Depending on the requirements of your storage configuration, you can choose to tune your spares policy toward one of the following approaches:
In the table below, consider each "approach" as the starting number of spares that is then modified by the "special considerations" as appropriate.
For RAID-DP configurations, consult the following table for the recommended number of spares.
Recommended Number of Spares | ||
Minimum | Balanced | Maximum |
Two per Controller | Four per Controller | Six per Controller |
Special Considerations | ||
Entry platforms | Entry-level platforms using only internal drives can be reduced to using a minimum of one hot spare. | |
RAID groups | Systems containing only a single RAID group do not warrant maintaining more than two hot spares for the system. | |
Maintenance Center | Maintenance Center requires a minimum of two spares to be present in the system. | |
>48-hour lead time | For remotely located systems, there is an increased chance that they might encounter multiple failures and completed reconstructions before manual intervention can occur. Spares recommendations should be doubled for these systems. | |
>1,200 drives | For systems using more than 1,200 drives, an additional two hot spares should be added to the recommendations for all three approaches. | |
<300 drives | For systems using less than 300 drives, you can reduce spares recommendations for a balanced or maximum approach by two. |
Selecting any one of the three approaches (minimum, balanced, or maximum) is considered to be the best practice recommendation within the scope of your system requirements. The majority of storage architects will probably choose the balanced approach, although customers who are extremely sensitive to data integrity might warrant taking a maximum spares approach. Given that entry platforms use small numbers of drives, a minimum spares approach would be reasonable for those configurations.
Additional notes about hot spares:
NetApp does not discourage administrators from keeping cold spares on hand. NetApp recommends removing a failed drive from a system as soon as possible, and keeping cold spares on hand can speed the replacement process for those failed drives. However, cold spares are not a replacement for keeping hot spares installed in a system.
Cold spares can replace a failed part (speeding the return/replace process), but hot spares serve a different purpose: to respond in real time to drive failures by providing a target drive for RAID reconstruction or rapid RAID recovery actions. It's hard to imagine an administrator running into a lab to plug in a cold spare when a drive fails. Cold spares are also at greater risk of being “dead on replacement,” as drives are subjected to the increased possibility of physical damage when not installed in a system. For example, handling damage from electrostatic discharge can occur when retrieving a drive to install in a system.
Given the different purpose of cold spares versus hot spares, you should never consider cold spares as a substitute for maintaining hot spares in your storage configuration.
The RAID option raid.min_spare_count can be used to specify the minimum number of spares that should be available in the system. This is effective for Maintenance Center users, because when set to the value 2 it notifies the administrator if the system falls out of Maintenance Center compliance. NetApp recommends setting this value to the resulting number of spares that you should maintain for your system (based on this spares policy) so that the system notifies you when you have fallen below the recommended number of spares.
I would add some details to the information provided below. You may set the
"raid.min_spare_count" to 0, 1, 2 or more. but if you do so, I'd recommend
changing the following as well: "raid.timeout". This option is usually set
to 24 which represent the numbers of hours before the system preemptively
auto-shutdown once the system no longer meets the raid /disk options set.
In other words, if your number of available spares[aggr status -s|vol
status -s] go below the number of required spares
then you will have your system running in degraded mode until you meet you
the stated requirements. If you are unable to satisfy those requirements
before the time limit has passed, the system will
auto-shutdown to prevent any potential data loss.
That been said, you should calculate your own requirements based on:
Type of disks
Type of RAID
Size of RAID
Data Risk Assessment::
Can the system suffer a shutdown without impact to business:
Yes :: how long? at what time of the day/night?
No :: -> critical system
What type of hardware warranty and support exist or need to be setup: 24/7
- 4 hours || 8am - 5pm, business day only [might still be
critical]
These are only high overview. At this point, the risk(s) would have to be
identified and a series of contingencies provide for review and approval by
the stakeholders based on the initial requirements stated by the
stakeholders.
Hope this helps as well.
Regards,
Allain Flores
Storage Consultant
Enterprise Storage Management - CDC
IBM Global Services
This transmission may contain information that is privileged, confidential
and/or exempt from disclosure under applicable law. If you are not the
intended recipient, you are hereby notified that any disclosure, copying,
distribution, or use of the information contained herein (including any
reliance thereon) is STRICTLY PROHIBITED. If you received this transmission
in error, please immediately contact the sender and destroy the material in
its entirety, whether in electronic or hard copy format.
Please consider the environment before printing this e-mail or any other
document
From: jwhite <xdl-communities@communities.netapp.com>
To: Allain Flores/Markham/Contr/IBM@IBMCA
Date: 09/28/2011 12:54 PM
Subject: "Raid group size recommendation" [NetApp Community > Products &
Solutions]
Re: Raid group size recommendation
created by jwhite in Products & Solutions - View the full discussion
Hello,
NetApp revised the best practice sparing policy last year to make it more
logical and applicable to the range of configurations we see in our
customer base. There is only a single configuration in which we recommend
using only a single spare drive - and that is a FAS2000 series (aka
"entry" level systems) that is using only internal drives (no external
storage attached). As some have pointed out already, the number of spares
to keep on hand varies depending on what you are concerned about with the
configuration. Here is the updated spares policy - the "official" NetApp
best practice sparing policy:
In other words, if your number of available spares[aggr status -s|volstatus -s] go below the number of required spares then you will have your system running in degraded mode until you meet you the stated requirements. If you are unable to satisfy those requirements before the time limit has passed, the system will auto-shutdown to prevent any potential data loss.
Sorry, but this is incorrect. Degraded mode means - raid group without protection (i.e. single disk missing in RAID4 or two disks missing in RAID_DP). Number of spare disks does not contribute to degraded status, and system will not shutdown if number of spares is low.
That is very true --- although the system will nag you about being below the minimum spare count it will not shut down the system because you don't have enough spares. Degraded Mode describes a system that has one or more failed drives and decribes the fact that system resources are being used to repair the drive (be it a Rapid RAID Recovery or RAID reconstruction). Degraded Aggregate describes an aggregate that contains one or more failed drives. Degraded RAID group describes a RAID group that contains one or more failed drives. That is the common usage of "Degraded" as it pertains to the storage subsystem today.
Sorry,
Got sidetrack on projects.
Clarification on raid.timeout from the command manual:
raid.timeout
Sets the time, in hours, that the system will run after a single disk
failure in a RAID4 group or a two disk failure in a RAID-DP group has
caused the system to go into degraded mode or double degraded mode
respectively. The default is 24, the minimum acceptable value is 0 and the
largest acceptable value is 4,294,967,295. If the raid.timeout option is
specified when the system is in degraded mode or in double degraded mode,
the timeout is set to the value specified and the timeout is restarted. If
the value specified is 0, automatic system shutdown is disabled.
I'd bring attention to ht last sentence in regards to the automatic system
shutdown...
Regards,
Allain Flores
Storage Consultant
Enterprise Storage Management - CDC
IBM Global Services
This transmission may contain information that is privileged, confidential
and/or exempt from disclosure under applicable law. If you are not the
intended recipient, you are hereby notified that any disclosure, copying,
distribution, or use of the information contained herein (including any
reliance thereon) is STRICTLY PROHIBITED. If you received this transmission
in error, please immediately contact the sender and destroy the material in
its entirety, whether in electronic or hard copy format.
Please consider the environment before printing this e-mail or any other
document
From: aborzenkov <xdl-communities@communities.netapp.com>
To: Allain Flores/Markham/Contr/IBM@IBMCA
Date: 09/29/2011 12:18 AM
Subject: "Raid group size recommendation" [NetApp Community > Products &
Solutions]
Re: Raid group size recommendation
created by aborzenkov in Products & Solutions - View the full discussion
In other words, if your number of available spares[aggr status -s|
vol
status -s] go below the number of required spares then you will have
your system running in degraded mode until you meet you the stated
requirements. If you are unable to satisfy those requirements before
the time limit has passed, the system will auto-shutdown to prevent
any potential data loss.
Sorry, but this is incorrect. Degraded mode means - raid group without
protection (i.e. single disk missing in RAID4 or two disks missing in
RAID_DP). Number of spare disks does not contribute to degraded status,
and system will not shutdown if number of spares is low.
of replies to the post:
Discussion thread has 64 replies. Click here to read all the replies.
Original Post:
Hi Folks, I'm trying to design a storage solution. A FAS3020 with 3
shelves full with 42x 300GB FC disks. Default the Raid Group size is 16
(14 + 2 spare) de max raid group size is 28. Does anyone have some best
practice information? The NOW site hasn't much info on that. Kind regards
Reply to this message by replying to this email -or- go to the message on
NetApp Community
Start a new discussion in Products & Solutions by email or at NetApp
Community
Stay Connected:
YouTube
Community
© 2011 NetApp | Privacy Policy | Unsubscribe | Contact
Us
495 E. Java Drive, Sunnyvale, CA 94089 USA
I'll try to do your question justice! I was looking at this from two aspects: Performance, and long-term capacity. While the system does indeed have 42 disks today, tomorrow it may have a need for additional capacity. So, by choosing a 15disk raid-group, I'm assuring myself not only maximum efficient RG design, I'm also committing to the maximum amount of space.
Hi There...
wondering if anyone has updated data with 900GB SAS drives. I am looking to create a 23 disk aggregate on a 3210, running 8.0.3.From the aggregate max(50tbs), this should be a good config. I don't really want to waste 4 disks in this config to parity.
thanks!
Hi John,
There are no massive changes re RG size recommendations:
Theoretically it is possible to have a RAID-DP aggregate with 23x (or even 28x) 900GB drives in one RG - however, the best practice suggests to keep RG size no bigger than 20.
Regards,
Radek