We're about to implement our NetApps in our Solaris environment and I was wondering if there were any guidelines with regards zpool configuration. In particular, I was wondering whether to mirror the disks in the pool or not. From the Solaris Internals Best Practices wiki: "For production environments, configure ZFS so that it can repair data inconsistencies. Use ZFS redundancy, such as RAIDZ, RAIDZ-2, RAIDZ-3, mirror, regardless of the RAID level implemented on the underlying storage device. With such redundancy, faults in the underlying storage device or its connections to the host can be discovered and repaired by ZFS."
This seems to be pretty clear that I should use mirroring. I'm hoping for some links on specific best-practices with Solaris (containers,ZFS) and NetApp.
Why do you think it is necessary to use ZFS ?
Which FS would you use on a Solaris OS? Not sure anybody would use any other filesystem as ZFS is so far advanced.
The tricky bit is that ZFS is essentially what you would use if you wanted to build a resilient filer on the cheap and you didn't have a NetApp handy. There are no "best practice" documents as far as I can tell, probably because the combination is rather rare - the features of the two overlap so significantly that you would seldom find a scenario where both are intended to be used at the same time.
What services are you implementing on your Solaris boxen ?
"ZFS is essentially what you would use if you wanted to build a resilient filer on the cheap and you didn't have a NetApp handy" - err... if you are installing Solaris UNIX your choices are essentially UFS and ZFS. Essentially, whenever you're going to assign a NetApp LUN to a Solaris UNIX machine via FC, the chances are you're going find an overlap. I'm not sure what you mean about using ZFS for cheap resilient filer - you're probably referring to an i386 implementation of Solaris (possibly even using OpenSolaris). I'm using ZFS as part of enterprise Solaris on SPARC architecture - kicking it old school - definitely not on the cheap side.
My Solaris boxes run a variety of applications, including Oracle and Sybase. I'm not talking about Solaris in a space where it is competing with NetApp but in a situation where the NetApp is providing the SAN storage to a Solaris machine.
I'm not sure what ZFS raid solutions are going to get you except more expensive storage. If you mirrored to 2 filers, then perhaps a bit more resiliance, but that won't be cheap either. ZFS was not a terribly good idea for Oracle a couple of years ago, and I haven't followed along enough to know if that has changed much. It uses a ton of memory and is nowhere as mature as UFS or VxFS or even NFS, but it has a lot of hype behind it. ZFS snapshots were pretty lame and mirroring was 'scp', more or less. Not exactly an enterprise solution. Not sure how well any of this works with SnapDrive...
If you don't use the EFI lun type, you are probably going to beat up your filers as well with a lot of extra I/O.
ZFS was/is a fine idea - it is a clean reimplementation of WAFL, after all. RAM and disk is cheap, so the argument that you can solve storage problems with lots and lots of caching rather than tight performance engineering sounds valid enough to me.
ZFS snapshots and mirroring are just like those of WAFL. The problem is not ZFS itself, but simply that the tools haven't been built around it to properly leverage those features. Maturity and ecosystem are the issues here, not technical deficiency. Sadly, Oracle don't seem to be minded to change that. I rather suspect that btrfs will come along and upstage it in a couple of years time due to sheer brute force adaptation on the major Linux distros.
Managed to get a document from someone on ZFS Best Practices with NetApp (TR-3603 June 2007). It says that I should definitely keep my ZFS RAID in place (mirroring for best performance). Also has a number of other useful titbits, like telling you how to align your WAFL and ZFS blocks.
Where have you got it from? Not found on fieldportal or NetApp library.
You could attach a copy of it and let us have a look at it. Doesn't show up after a cursory Google.
Forced to concur with the other guys here. Using ZFS mirroring on top of a resilient storage array (whether it is NetApp or anything else) is just a waste of disk space. The V-Series is WAFL on top of third party resilient storage, NetApp don't recommend adding another layer of resiliency there .. why would they do so for ZFS ? Something ain't right.
It would hardly be sensible to use UFS of course, but surely it would be sufficient to simply add the LUNs to a zpool and use plain old concatenation. ONTAP will protect your LUNs.
Can you please attach a copy for us?
I'm kinda new around here, how do I go about attaching a PDF?
Anyway, I found the same doc referenced in http://communities.netapp.com/message/44802#44802 where the link is http://media.netapp.com/documents/tr-3603.pdf.
The idea for using mirroring is for zfs's self healing mechanism. Looking further into the post referenced, the block alignment is actually more of an issue and bears further investigation.
Ah, that's the one.
It is clearly a NetApp document but a lot of it seems to be a copy and paste of Sun ZFS technical material without much in the way of reference to its relevance on NetApp. For example, the benefits of ZVOLs are expounded without noting that they are completely redundant and a useless additional layer of complexity if you are already in an environment where you have a NetApp filer.
As a consequence, some of the claims made in it don't seem to make sense. For example :
"A mirrored configuration consumes more disk space but generally performs better with small
random reads. "
Irrespective of the whether the data is random or sequential, if you set up a mirrored zpool you will have to do additional, redundant (=pointless), reads or writes for each read or write you do to the ZFS volume. Aside from doing additional pointless accesses, the filer's own cache will get thrashed more. With two LUNs in a mirror the cache benefit would theoretically be half of what it would be without the LUNs being mirrored.
"A RAID Z2 configuration offers excellent data availability, and performs similarly to RAID Z. RAID
Z2 has significantly better mean time to data loss (MTTDL) than either RAID Z or two-way mirrors."
If you configure RAID-Zx (or any type of RAID from any vendor) on multiple NetApp LUNs that are hosted on the same aggregate (or any storage device from any vendor that already implements RAID and scrubbing, which most of them do), there is no additional data security benefit - WAFL already protects against creeping corruption - but there will be an additional performance hit due to the additional accesses. If you were crazy enough to do RAID-Z3 you would be doing three extra accesses for every block. I would imagine this would have "interesting" (in the Chinese proverb sense) effects on the filer's own internal caching algorithms. Not to mention using substantial extra storage space.
Those concerns won't apply if you are using zpools to create a mirror over multiple aggregates. But the document does not make this clear, and it should always be pointed out that you're a great deal better off using the native filer features for redundancy, mirroring, snapshots etc. (given that you've paid hefty $$$ for them) rather than adding an extra layer of complexity to sort-of do the same thing.
As for the self-healing thing .. the NetApp filer already does this, so it will catch and fix any errors (rebuilding physical disks as necessary) long before they become apparent to ZFS .. if ZFS even saw any corruption the filer would not be doing its job. The block alignment tips are worthwhile (and not an issue specific either to ZFS or to NetApp)
IMHO, this document does not look like it was supposed to have been released.
If I went this way, I would separate my mirror over two aggregates (and two controllers) that leaves me with the concerns stated your last 3 paragraphs. Which I summarise as this:
1. Use native file features (given that they cost a lot) rather than adding complexity.
2. NetApp self-healing abrogates ZFS self-healing
3. Block alignment is an issue.
Is there any NetApp best-practices documents that cover these issues?