ONTAP Hardware
ONTAP Hardware
We recently purchased a FAS 2240-2 and FAS 2240-4 to replace our existing SAN, but I've got some time to play around and get familiar with them. Anyway, the SE that configured the systems for us really led us astray, and specified four 200GB SSDs in each unit to give us 400 GB of Flash Pool on only one of the heads in each FAS. We were told 400 GB is the current pool size limit.
What we didn't realize is that in the hybrid aggr, the SSD raid group must match the RAID protection of the HDD group, and since we went with RAID-DP, the SSD group must be RAID-DP. So one data, one parity, one d-parity, and one hot spare. This leaves us with only 200GB flash pool, and three SSDs just for RAID, ridiculous! I didn't know this until the installers came out and set everything up, and I'm still steaming from it. I tried to convince the installer that couldn't be the case, but he called several guys he works with and they said the console wouldn't let you do it if you tried, it must be RAID-DP, and it must have a hot spare.
But looking through some docs today, I saw that it may be possible to convert the SSD RAID group to RAID4:
https://library.netapp.com/ecmdocs/ECMP1368404/html/GUID-C31EDDB7-7DDD-4042-995C-B93CF0F4B9DA.html
Another option is to add the hot spare into the RAID group I think. But I'm afraid the system will complain about not having a hot spare. With RAID4, we'd have two data, one parity, and a hot spare, giving us the maximum 400GB flash pool size.
Any thoughts? Should I just leave it alone and deal with only having 200GB flash pool?
Solved! See The Solution
Robert -
I tried creating the mixed raid aggr on an 8.2 7-mode sim, and had success.
7m82a> aggr add test -t raid4 -g new -T SSD 8
Note: preparing to add 7 data disks and 1 parity disk.
Continue? ([y]es, [n]o, or [p]review RAID layout) p
The RAID group configuration will change as follows:
RAID Group Current NEW RAID Type
---------- ------- --- ---------
/test/plex0/rg0 8 disks 8 disks raid_dp
/test/plex0/rg1 8 disks raid4
I was also able to flip the raid type of the SSD tier back and forth from 4 to DP and back again:
7m82a> aggr options test raidtype raid_dp -T SSD
Aggregate test: cache raid group size is adjusted from 8 to 23 after changing raidtype.
7m82a> Fri Apr 18 02:33:29 GMT [7m82a:raid.config.raidsize.change:notice]: aggregate test:cache raidsize is adjusted from 8 to 23 after changing raidtype.
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.missing:notice]: RAID group /test/plex0/rg1 is missing 1 disk(s).
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.info:notice]: Spare disk v6.25 will be used to reconstruct one missing disk in RAID group /test/plex0/rg1.
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.start:notice]: /test/plex0/rg1: starting reconstruction, using disk v6.25
7m82a>
7m82a> aggr options test raidtype raid4 -T SSD
Fri Apr 18 02:33:47 GMT [7m82a:raid.rg.recons.aborted:notice]: /test/plex0/rg1: reconstruction aborted at disk block 43264 after 0:18.11
Aggregate test: cache raid group size is adjusted from 23 to 8 after changing raidtype.
7m82a> Fri Apr 18 02:33:48 GMT [7m82a:raid.config.raidsize.change:notice]: Aggregate test:cache raidsize is adjusted from 23 to 8 after changing raidtype.
Fri Apr 18 02:33:48 GMT [7m82a:raid.vol.mixed.raid.type:info]: test is now a mixed RAID type aggregate.
Still not sure why the -T flag wasn't working for you earlier ?
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
Robert -
You seem to be correct in your thinking.
You should be able to change the raid type of just the SSD tier to raid4, and given your small pool of available drives, I would do so.
It seems a waste of expensive SSDs otherwise.
SSDs have a lower rate of failure, and you have a lower chance of failure given the small number of drives, so would be worried about not having DP protection there.
I've tried it on my lab simulators, and it worked just fine.
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
Well... I couldn't wait to try it, and kind of wanted it to fail so I had an excuse to build up a FAS from scratch
But, it doesn't work. I tried this command from the docs:
aggr options aggr0 raidtype raid4 -T SSD
But it complains the -T SSD is not a valid option.
Do you remember what the syntax you used was?
Robert -
The documentation example you gave was for Cluster mode, as was the simulator I tried it on.
The 7-mode documentation seems to indicate it can be done there, too. From the 'aggr' command man page in 8.2:
"The -T parameter can be specified to change the RAID type of the HDD RAID groups or the SSD cache of a Flash Pool. To specify the SSD cache, use -T SSD. To specify the HDD RAID groups, specify any Data ONTAP disk type used in the HDD RAID groups of the Flash Pool. "
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
Hmmm, I think that -T option may be for the 'add' command. Here's the error I get when I try it a few different ways.
aggr options aggr0 raidtype raid4 -T SSD
aggr options: Too many arguments (beginning with '-T')
aggr options aggr0 -T SSD raidtype raid4
aggr options: '-T' is not a legal aggregate option.
Which has me wondering, what if I issued 'aggr options aggr0 raidtype raid4'? Would it change both the RAID groups (the SSD and HDD in the hybrid aggr) to RAID4? I'm too cautious to try it
Robert -
The documentation line was from the 'aggr options' section of the man page.
It was 7-mode 8.2 documentation. What version are you running ?
There's also the following under 'aggr add' on the man page:
The -t raidtype argument specifies the type for new RAID groups created when adding disks to the aggregate. Use this parameter when you add the first RAID group comprised of SSDs to a hybridenabled aggregate. Possible values are raid4 for RAID 4 and raid_dp for RAID DP. The default value is type of existing RAID groups of the aggregate.
That would imply that raid4 could have been set when first adding the SSDs to the aggregate.
I was wondering the same thing about changing the whole aggr to raid 4, then just changing the HDD disks back to DP.
Cautious ? Is this system already in production ?
I was hoping it wasn't yet, and you could try the rebuild ...
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
aggr options aggr0 raidtype raid4
aggr options: Can't revert a raid_dp aggregate to raid4 as it results in 8 disks in the raid group, which exceeds the maximum raid group size of 7 disks for a raid4 aggregate.
Ah man.... looks like I'm going to be completely rebuilding
Arghhh... I just found this note in TR-4070:
"On a FAS2200 series system that is running Data ONTAP 8.2, the recommendation is to configure the SSD RAID group with RAID 4 protection and one hot spare SSD. With this configuration, data is protected, and an immediate and fast RAID group rebuild occurs if a SSD fails. If RAID 4 protection is chosen for an SSD RAID group in a Flash Pool aggregate on any other FAS or V-Series system, at least one hot spare SSD should be maintained for each node that has a Flash Pool aggregate configured on it."
Well... looks like I'm going to get adventurous here Wish I had read this before the system implementers came onsite!
The crappy thing is I think I'll have to rebuild all 4 FAS heads, it says in the docs that a Flash Pool can never be removed, the aggr must be destroyed. Well if I destroy them, I'll take out my vol0, so that means rebuild I'm pretty sure.
Robert -
How many spare disks to you have ?
Do you have any other aggr besides aggr0 ?
Vol0 can be copied to another aggr, then set the root option on the new volume and reboot to make it the new vol0.
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
Robert -
I tried creating the mixed raid aggr on an 8.2 7-mode sim, and had success.
7m82a> aggr add test -t raid4 -g new -T SSD 8
Note: preparing to add 7 data disks and 1 parity disk.
Continue? ([y]es, [n]o, or [p]review RAID layout) p
The RAID group configuration will change as follows:
RAID Group Current NEW RAID Type
---------- ------- --- ---------
/test/plex0/rg0 8 disks 8 disks raid_dp
/test/plex0/rg1 8 disks raid4
I was also able to flip the raid type of the SSD tier back and forth from 4 to DP and back again:
7m82a> aggr options test raidtype raid_dp -T SSD
Aggregate test: cache raid group size is adjusted from 8 to 23 after changing raidtype.
7m82a> Fri Apr 18 02:33:29 GMT [7m82a:raid.config.raidsize.change:notice]: aggregate test:cache raidsize is adjusted from 8 to 23 after changing raidtype.
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.missing:notice]: RAID group /test/plex0/rg1 is missing 1 disk(s).
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.info:notice]: Spare disk v6.25 will be used to reconstruct one missing disk in RAID group /test/plex0/rg1.
Fri Apr 18 02:33:29 GMT [7m82a:raid.rg.recons.start:notice]: /test/plex0/rg1: starting reconstruction, using disk v6.25
7m82a>
7m82a> aggr options test raidtype raid4 -T SSD
Fri Apr 18 02:33:47 GMT [7m82a:raid.rg.recons.aborted:notice]: /test/plex0/rg1: reconstruction aborted at disk block 43264 after 0:18.11
Aggregate test: cache raid group size is adjusted from 23 to 8 after changing raidtype.
7m82a> Fri Apr 18 02:33:48 GMT [7m82a:raid.config.raidsize.change:notice]: Aggregate test:cache raidsize is adjusted from 23 to 8 after changing raidtype.
Fri Apr 18 02:33:48 GMT [7m82a:raid.vol.mixed.raid.type:info]: test is now a mixed RAID type aggregate.
Still not sure why the -T flag wasn't working for you earlier ?
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
AH HA, we're running 8.1.3, I thought this whole time we were on 8.2. I will upgrade to 8.2.1 today and try the command again.
You want your data in Raid-DP, .. That's my opinion
It worked! After upgrading to 8.2.1, the -T argument was accepted. Now the aggregate shows 372 GB Flash Pool!
Here's what I did:
This command changed the SSD raid group to raid4:
aggr options aggr0 raidtype raid4 -T SSD
"aggr status -r" showed two SSD spare disks. I added the specific SSD that used to be in the RAID-DP set because it was showing as not being zeroed-out:
aggr add aggr0 -g rg1 -d 0b.00.0
That command added another data disk to the RAID4 group after zeroing it out. The only negative I see so far is I can't edit the aggregate now in the System Manager, it says "This aggregate comprises mixed RAID type. You cannot edit aggregates having mixed RAID type."
I can live with that.
Robert -
I had to wonder when the raid conversion worked on the sim for me.
Versions matter.
Another bit of errata for me to remember.
Glad to hear the change worked for you !
At your service,
Eugene E. Kashpureff
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, Unitek Education http://www.unitek.com/training/netapp/
(P.S. I appreciate points for helpful or correct answers.)
Thanks TREBOR2000! Your solution worked like a charm!
Looks like this thread has been closed for a while but I thought I would mention, our Netapp SE recommended that we use raid-dp instead of raid-4 and instead just add the spare disk into the aggregate(and turn off the notification of no spares if it complains). The logic was that raid-DP would recover faster than raid-4 with spare, because the "spare" would already be hot. Any thoughts on this?
Mikker,
At least for FAS2200 series it is recommended to use RAID4 for flash pool.
From TR-4070
"On a FAS2200 series system that is running Data ONTAP 8.2, the recommendation is to configure the SSD RAID group with RAID 4 protection and one hot spare SSD. With this configuration, data is protected, and an immediate and fast RAID group rebuild occurs if a SSD fails. If RAID 4 protection is chosen for an SSD RAID group in a Flash Pool aggregate on any other FAS or V-Series system, at least one hot spare SSD should be maintained for each node that has a Flash Pool aggregate configured on it."
I have seen several recommendations to use RAID4 (2cache+1parity+1spare) instead of RAID-DP (2cache+2parity, no spare) but this goes about small boxes.
As far as I know, the only way to disable low spare warning, is by setting the "raid.min_spare_count" option to zero. This is a global option and it will be valid for all the controller's raidgroups whether they are containing SSDs or not.
IMHO, it is certainly not a best practice.
Saying raid-DP would recover faster than raid-4 with spare, because the "spare" would already be hot, is an absolute nonsense.
Since the SSD is in the aggregate, it's not a spare disk, it's a parity disk. If one SSD fails, aggregate will be in a degraded condition with no spare to rebuild and you will not be warned of the low spare condition since you disabled it.
Can you elaborate, our Netapp SE is still insisting that raid-dp would be best. I get that we will not be warned about the low spare condition(which we hope to get past with proper monitoring, and as I understand it, the autosupport request for disk replacement would still work), but otherwise, the failure condition seems the same(two disks can fail before we have data loss), and the recovery time seems like it would be faster on raid-dp. I don't see how it's "absolute nonsense"
For Cluster Mode with 8.2 and higher:
Change Raid Group type for SSD disks on Hybrid aggr:
cl1:: > aggr modify -aggregate aggr_d01 -raidtype raid4 -disktype SSD
Then add the relised disk to the aggr:
cl1:: > aggr add-disks -aggregate aggr_d01 -disktype SSD -raidgroup rg1 -diskcount 1