Legacy Product Discussions

Highlighted

vol0 - RAID4 or RAID-DP ?

Hi all

I am interested to hear what people do for the RAID config for vol0/aggr0.  I have always left it as a 3 disk RAID-DP aggregate with vol0 on it.  Now disk sizes have got so big, is it a normal and recommended practice to use RAID4 for aggr0 so as not to lose another disk, as with 3 disks in the aggregate you dont really benefit from the "lots of spindles in the aggregate" idea.  I know DP protects against 2 disk failures in the aggregate, but its a decision against the likelihood of this happening, against losting potentially another 1 (or even 2) Tb disk.

Anyone have thoughts on this ?


Dave

17 REPLIES 17
Highlighted

Re: vol0 - RAID4 or RAID-DP ?

in my humble opinion i'd rather allocate those 3 disks to a larger aggregate than just to leave it for vol0.

NetApp best practices for this differ a bit from country to country, I know that for example the Dutch tend to build a 3disk RAID-DP just for vol0.

if anyone else has comments or would like to sell their fish, i'd be keen to hear.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Agree...I see very little value in a dedicated root aggregate vs using the disks.  Dedicated root volume?  Definitely.  But not the aggregate.

However if you insist on a dedicated root aggregate, I'd definitely go RAID4 and only use 2 disks.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

We have some in the field war stories where not having a separate root aggregate takes longer to run wafliron/wafl_check... for larger installs, we default to a separate 3 drive root aggregate so that we don't have to wait for a long process to get the system back up... it doesn't help the time to fix the larger aggregate but does let us bring the system up sooner with support (the last one where this happened, the gsc recommended a separate root aggr).  This is a rare edge case but one we have seen more than once.  I did use 2 drive raid4 root aggregates for a while, but then background disk firmware update requires raid_dp.  So we have to force the disk_fw_update to each drive and it can affect ndu upgrades if the system tries to update firmware on reboot since it didn't run in the background with the system up.  On smaller systems, FAS2000, etc. or low number of shelves on any system, I don't hesitate to mix the root volume in one large aggregate.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

If people want to design for a less than 1% scenario, feel free.  I find very few customers who want to do this.

As far as disk_fw_updates, assuming you've got a spare around, you can always temporarily convert it to RAID-DP, do your background update, then convert it back to RAID-4.  Both of those conversions can be done on the fly.

Just a thought....

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Even though rare, we have been bit enough times to keep the separate aggregate...but only if a large enough system where 3 drives don't make a huge difference.  This has been a huge debate for some time and I have gone back and forth depending on the customer requirements.

We have recommended that customers change from raid4 to raid_dp then back to raid4 (a good idea and workaround), but they often don't want to go through this every time to wait for rebuild, then zero the spare drive after dropping back to raid4 so they don't have to wait to zero if the drive is used... the only time we ensure the process is done is when we do the PS onsite for the upgrade....but for the cost of a drive, the PS costs more than the single disk as long as enough room in the system/shelves to have the extra drive or 3 drives.

At other customers, as long as they have 2 aggregates, we have had them automate ndmpcopy /root/etc from one aggr to another aggr volume then if any issues they can aggr options root the other aggregate from maintenance mode and get the system back up...it would be even more rare to have to wafl_check 2 aggregates.  With enough disks to need at least 2 aggregates this should be as effective or even more resilient...as long as the /etc copy is done regularly.

Are there any updated best practices on this?  I heard some of the wafl tools are getting rewritten for 64 bit aggregates to allow for bigger limits...that might make this a non-issue if it can bring up/recover things quicker.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Very interesting discussion....at this point we only have one customer who does this (they happen to have a MetroCluster as well so it's a decently size installation). Some thoughts are.....

  • I'd almost never do a separate aggr for vol0 on 20x0 boxes -- just not enough spindles and the usable space percentage is already often rather low.
  • I really like Scott's idea about the scheduled NDMP copy....quite a nice workaround that still preserves usable space/doesn't give up I/O.

I am really curious to hear Scott's war stories...and if anyone from NetApp support can chime in that would be fantastic (i.e. how often do they see a separate aggr for the root volume saving people).

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

The war stories I can count on one hand but the full hand and can't mention names...but after a few times plus running wafl check would have been faster or could have been done online with wafl iron if the root aggregate were able to come up separate... most were on sata only systems with firmware bugs (which is why we always push for the latest at-fcx firmware)... it likely is 1% or even less of an edge case like Adam said.

The ndmpcopy method works well... keep another volume on another aggregate... but use -f to force overwrite files..or just vol copy would work too...but the thing is getting it automated and checking it is up to date so you can mark the backup root volume as root.  One customer made /etc a qtree and used qsm to another volume...but issue there is making it writable..the workaround you could assign another aggregate as root which will automaticaly create "AUTOROOT" then you could break the etc qsm then mark it as root and reboot.  This may be a more managable method using qsm.

I also typically rename vol0 to root but that is more ocd than anything else possibly

Several SEs (a group of us and growing) have made a wish list to have two cf cards or two small ssds in the controller and not have root on disk...but not as easy as it sounds probably.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

To get a system into a "supportable" state I have a tiny TFTP server on my labtop and a set of Ontap 7xxx_netboot.n with me (resp. download it just before I go to the customer)

then just connect my labtop and do a netboot - so no need for a bootable root-volume or even aggregate.

That is why I never made a separate aggr for vol0, even though NetApp best pract recommend - the trade-off of loosing spindles and waste space counts it out.

Mark

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

It does not matter where you load kernel from, network or CF, as long as root volume cannot be mounted.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Another interesting thing that we will be seeing more of soon is 64-bit aggregate systems on 8.x.. The root aggregate needs to be 32-bit... I haven't tested putting root on 64-bit but saw it is not supported in one of the docs.  If we have a system where we want all 64-bit aggrs, we'll need a separate 32-bit root aggregate for root.

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

If we have a system where we want all 64-bit aggrs, we'll need a separate 32-bit root aggregate for root.

Yep, this is exactly what I have heard as well.

I reckon it will be gone over time & in 'some' future 8.x.x release root volume will gain 64-bit support (I don't know whether this is the plan, but it makes common sense to me)

Regards,
Radek

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Hi guys,

Where did you find the best practice with regards to creation of Vol0/Aggr0 as a seperate vol/aggr?

Regards,

HongWei

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

There you go:

http://now.netapp.com/NOW/knowledge/docs/ontap/rel732/html/ontap/smg/GUID-306BB9AE-99CF-4C98-AB5F-C23A77FA4B6A.html

It actually gives a balanced view on separate vs. non-separate aggregate for root volume:

- For small storage systems where cost concerns outweigh resiliency, a  FlexVol based root volume on a regular aggregate might be more  appropriate.

- FlexVol recovery commands work at the aggregate level, so all of the  aggregate's disks are targeted by the operation. One way to mitigate  this effect is to use a smaller aggregate with only a few disks to house  the FlexVol volume containing the root volume.

Regards,
Radek

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

Thanks Radek !

Can't see that portion in the 7.3.2 version of storage management

So if i have a large filer it is always recommended to have a seperate small aggr0 to contain the root volume, am i right ?

After meeting the minimum root volume space i can always serve data with the remaining available space in order to fully utilized the disks right ?

Regards,
HongWei

Highlighted

Re: vol0 - RAID4 or RAID-DP ?

This has quickly become a serious issue in regards to the FAS6080's that we have.  We are seeing the dreaded java.lang error in FilerView forcing reboots of the filer to clear the space.  We are going to have to end up making another small aggr and moving vol0 to it so its not overwhelmed with I/O for other disks.  Personally I am with you all, what is wrong with a couple CF cards or some nice SSD's and some good old fashioned RAID1?  If anything using that as a primary and using the ndmpcopy (good call Scott!) to have a backup should it hit the fan. Come on NetApp help us out here.

Check out the KB!
NetApp Insights To Action
All Community Forums