I am interested to hear what people do for the RAID config for vol0/aggr0. I have always left it as a 3 disk RAID-DP aggregate with vol0 on it. Now disk sizes have got so big, is it a normal and recommended practice to use RAID4 for aggr0 so as not to lose another disk, as with 3 disks in the aggregate you dont really benefit from the "lots of spindles in the aggregate" idea. I know DP protects against 2 disk failures in the aggregate, but its a decision against the likelihood of this happening, against losting potentially another 1 (or even 2) Tb disk.
Anyone have thoughts on this ?
Very interesting discussion....at this point we only have one customer who does this (they happen to have a MetroCluster as well so it's a decently size installation). Some thoughts are.....
I am really curious to hear Scott's war stories...and if anyone from NetApp support can chime in that would be fantastic (i.e. how often do they see a separate aggr for the root volume saving people).
Another interesting thing that we will be seeing more of soon is 64-bit aggregate systems on 8.x.. The root aggregate needs to be 32-bit... I haven't tested putting root on 64-bit but saw it is not supported in one of the docs. If we have a system where we want all 64-bit aggrs, we'll need a separate 32-bit root aggregate for root.
If we have a system where we want all 64-bit aggrs, we'll need a separate 32-bit root aggregate for root.
Yep, this is exactly what I have heard as well.
I reckon it will be gone over time & in 'some' future 8.x.x release root volume will gain 64-bit support (I don't know whether this is the plan, but it makes common sense to me)
This has quickly become a serious issue in regards to the FAS6080's that we have. We are seeing the dreaded java.lang error in FilerView forcing reboots of the filer to clear the space. We are going to have to end up making another small aggr and moving vol0 to it so its not overwhelmed with I/O for other disks. Personally I am with you all, what is wrong with a couple CF cards or some nice SSD's and some good old fashioned RAID1? If anything using that as a primary and using the ndmpcopy (good call Scott!) to have a backup should it hit the fan. Come on NetApp help us out here.
Keeping the controller "stateless" is a fairly important design criteria for OnTAP which tends to rule out things like SSD's within the controller, the existing CF card is just there as a cache to speed up the process of booting, and I doubt that NetApp would ever trust anything as important as the root volume to something that didnt have enterprise class resiliency features. If a customer was concerned about the waste of three HDD's as a dedicated root volume, then I doubt that dedicating a couple of SSD's would seem like an attractive value proposition either.
Having said that, IMHO the problem solved by keeping dedicated root volumes in tiny aggregates has been substantially solved in a different way by new versions of WAFLIron in OnTAP 7.3.1+ (the ones with the optional commit feature). As we move to 64bit massive aggregates as a default configuration, the idea of needing to completely check the entire filesystem which may be 100's of TiB in those rare cases where we decide to mark the aggregate as inconsistent will simply go away because its not a viable long term approach. I know that a lot of work has already gone into reducing the situations where we will need to do this at all, and things like better versions of WAFLIron have already made significant improvement to the amount of time it takes to back to a consistent state if it ever does happen.
From my perspective the dedicated root volume is good practice for those customers with large systems and who want the most conservative settings possible, and thats probably why the dedicated root aggregate remains best practice. In general every NetApp best practice I've seen is the most conservative option available from the point of view of availability and reliability, a kind of "if you do this, then you cant go wrong", which is fair enough. Personally I'm more adventurous, If it were up to me, I'd probably use fewer hot spares than best practice reccomends and only use 2 per controller along with RAID-DP regardless of the number of shelves in the system, but that probably explains why nobody asks me to write those best practice documents 🙂
Internal SSDs, Global Spares, Updated Storage Resiliency Document, Newer Proposed Architectures are all the hot topics internally in our netapp circle as well (SEs, Consultants and Users). I would love to see NetApp sponsoring a dedicated WebEx/MeetUp session to get all of these like minded folks together for a healthy discussion (or even better - lock them in a room for a few hrs, and see what comes out).
The war stories I can count on one hand but the full hand and can't mention names...but after a few times plus running wafl check would have been faster or could have been done online with wafl iron if the root aggregate were able to come up separate... most were on sata only systems with firmware bugs (which is why we always push for the latest at-fcx firmware)... it likely is 1% or even less of an edge case like Adam said.
The ndmpcopy method works well... keep another volume on another aggregate... but use -f to force overwrite files..or just vol copy would work too...but the thing is getting it automated and checking it is up to date so you can mark the backup root volume as root. One customer made /etc a qtree and used qsm to another volume...but issue there is making it writable..the workaround you could assign another aggregate as root which will automaticaly create "AUTOROOT" then you could break the etc qsm then mark it as root and reboot. This may be a more managable method using qsm.
I also typically rename vol0 to root but that is more ocd than anything else possibly
Several SEs (a group of us and growing) have made a wish list to have two cf cards or two small ssds in the controller and not have root on disk...but not as easy as it sounds probably.
in my humble opinion i'd rather allocate those 3 disks to a larger aggregate than just to leave it for vol0.
NetApp best practices for this differ a bit from country to country, I know that for example the Dutch tend to build a 3disk RAID-DP just for vol0.
if anyone else has comments or would like to sell their fish, i'd be keen to hear.
Agree...I see very little value in a dedicated root aggregate vs using the disks. Dedicated root volume? Definitely. But not the aggregate.
However if you insist on a dedicated root aggregate, I'd definitely go RAID4 and only use 2 disks.
We have some in the field war stories where not having a separate root aggregate takes longer to run wafliron/wafl_check... for larger installs, we default to a separate 3 drive root aggregate so that we don't have to wait for a long process to get the system back up... it doesn't help the time to fix the larger aggregate but does let us bring the system up sooner with support (the last one where this happened, the gsc recommended a separate root aggr). This is a rare edge case but one we have seen more than once. I did use 2 drive raid4 root aggregates for a while, but then background disk firmware update requires raid_dp. So we have to force the disk_fw_update to each drive and it can affect ndu upgrades if the system tries to update firmware on reboot since it didn't run in the background with the system up. On smaller systems, FAS2000, etc. or low number of shelves on any system, I don't hesitate to mix the root volume in one large aggregate.
To get a system into a "supportable" state I have a tiny TFTP server on my labtop and a set of Ontap 7xxx_netboot.n with me (resp. download it just before I go to the customer)
then just connect my labtop and do a netboot - so no need for a bootable root-volume or even aggregate.
That is why I never made a separate aggr for vol0, even though NetApp best pract recommend - the trade-off of loosing spindles and waste space counts it out.
There you go:
It actually gives a balanced view on separate vs. non-separate aggregate for root volume:
- For small storage systems where cost concerns outweigh resiliency, a FlexVol based root volume on a regular aggregate might be more appropriate.
- FlexVol recovery commands work at the aggregate level, so all of the aggregate's disks are targeted by the operation. One way to mitigate this effect is to use a smaller aggregate with only a few disks to house the FlexVol volume containing the root volume.
Thanks Radek !
Can't see that portion in the 7.3.2 version of storage management
So if i have a large filer it is always recommended to have a seperate small aggr0 to contain the root volume, am i right ?
After meeting the minimum root volume space i can always serve data with the remaining available space in order to fully utilized the disks right ?
If people want to design for a less than 1% scenario, feel free. I find very few customers who want to do this.
As far as disk_fw_updates, assuming you've got a spare around, you can always temporarily convert it to RAID-DP, do your background update, then convert it back to RAID-4. Both of those conversions can be done on the fly.
Just a thought....
Even though rare, we have been bit enough times to keep the separate aggregate...but only if a large enough system where 3 drives don't make a huge difference. This has been a huge debate for some time and I have gone back and forth depending on the customer requirements.
We have recommended that customers change from raid4 to raid_dp then back to raid4 (a good idea and workaround), but they often don't want to go through this every time to wait for rebuild, then zero the spare drive after dropping back to raid4 so they don't have to wait to zero if the drive is used... the only time we ensure the process is done is when we do the PS onsite for the upgrade....but for the cost of a drive, the PS costs more than the single disk as long as enough room in the system/shelves to have the extra drive or 3 drives.
At other customers, as long as they have 2 aggregates, we have had them automate ndmpcopy /root/etc from one aggr to another aggr volume then if any issues they can aggr options root the other aggregate from maintenance mode and get the system back up...it would be even more rare to have to wafl_check 2 aggregates. With enough disks to need at least 2 aggregates this should be as effective or even more resilient...as long as the /etc copy is done regularly.
Are there any updated best practices on this? I heard some of the wafl tools are getting rewritten for 64 bit aggregates to allow for bigger limits...that might make this a non-issue if it can bring up/recover things quicker.