Erasing Misconceptions Around RAID & Erasure Codes

** This following is a GUEST POST by Peter Corbett, Vice President & Chief Architect of NetApp (aka "Father of RAID-DP") **


There has been a large amount of interest in recent years in erasure codes.  The common class of erasure coding algorithms that people normally associate with the term includes algorithms that add a parameterized amount of computed redundancy to clear text data, such as Reed Solomon coding, and algorithms that scramble all data into a number of different chunks, none of which is clear text.  In both cases, all data can be recovered if and only if m out of n of the distributed chunks can be recovered.  It is worth noting that, strictly speaking, RAID4,5,6 and RAID-DP are also erasure coding algorithms by definition.  However, that is neither here nor there – XOR parity-based schemes have different properties than the “new” algorithms being used in some systems that are what people are thinking of when they talk about “erasure codes”.

(Credit:, Page 2)


Use-cases for each

Erasure codes are being used for deep stores, for distributed data stores and for very scalable stores.  They are commonly used in RAIN systems, where the code covers both disk, node and connectivity failures, requiring data reconstruction after a node failure (instead of HA takeover which is much much faster).  These erasure codes are more computationally intensive both on encode (write) and decode (reconstruct) than xor parity-based schemes like RAID-DP.  In fact, one of the big motivations for developing RAID-DP was that the “industry-standard” Reed-Solomon code for dual-parity RAID-6, as well as the less widely used Information Dispersal algorithms to protect against more than one failure, are more computationally intensive than RAID-DP.  RAID algorithms are also very suitable for sequential access in memory, as they can work with large word sizes.  Many of the complex erasure codes are based on Galois Field arithmetic  that works practically only on small (e.g. 4, 8 or 16 bit) quantities, although there are techniques for parallelizing in hardware, on GPUs, or using the SSE* instructions on Intel architecture processors.

(Credit:, Page 2)



Erasing Limitations

Basically, an erasure code only works when you know what you’ve lost.  It provides sufficient redundancy to recover from a defined number of losses (failures).  If you don’t know what you’ve lost, you need an error detection and correction code, which requires a higher level of redundancy.


For disks, the known loss failure modes that we can see are disk failures, sector media failures and platter or head scoped failures.  All of these require reconstruction from redundant information, which in our case is dual parity (row and diagonal parity covering all double failures).  There are other failures that can invoke reconstruction, i.e. connectivity failure to a disk shelf.


However, there are other modes of disk failure that are more insidious, and that require some way to determine which data is bad from a set of data that the disk sub-system is claiming is good i.e. silent errors.  To cover those cases, we add metadata and checksums to each block of data.  If we detect a problem there, we can then use parity to reconstruct.   So, just saying that you have an erasure code that protects against more than two losses is not sufficient to claim superior protection of data.  You have to have worked through all the failure modes and make sure you can protect against those failures.  We’ve hardened ONTAP over nearly 20 years of existence to provide a very high level of resiliency against all modes of disk failure in combinations.


That said, we are very aware of the trend to larger disks and the impact that has on reconstruction times and on the probability of higher order multiple failures.  While we can’t disclose plans and roadmaps publicly, we do have a good understanding of this area that will allow us to continue to meet the reliability needs of future FAS & E-Series systems as well as future disks and solid state media.


Let's talk when you have erasure codes out of the box.  Scale out infrastructures demand single copy data stores across multiple data centers, and erasure codes across nodes it the path to get there.  Nice to see you guys talking about the science behind data storage a bit, nice work!


Hey Blake,

Agreed.  We should probably brief you on our plans in this regard.  Let Mike and team know when you want to schedule that.


This topic needs much more exposure. Some irresponsible vendors make lofty claims about the absolute safety of Erasure Codes (EC), while this post outlines the limitations as well as advantages. Quite frankly, without T10 DIF integration, EC's are playing Russian Roulette with the durability of data.  Over the years, I've witnessed all hardware (spinning rust, ASIC's, FPGA's and even CPU's) fail in spectacularly creative ways. Silent data corruption is perhaps the most insidious failure mode which only mature storage vendors even talk about, much less know how to deal with.

Store your data at your own risk with EC only solutions!

Thanks for articulating a new buzz word that is popping up in more discussions and showing the similarities in Erasure Coding Vs. RAID. The big difference for me is still performance and geography. RAID=Local, Erasure Coding=Geographical. However on the flip side RAID=Performance, Erasure=Geographical reliability. It is the constant struggle of achieving high performance with high levels or reliability.

I've always thought of EC as just another form of RAID. In fact I'd say that RAID also only works when you know what you've lost. At the enterprise level, with either RAID or EC, silent failure detection is a must.  Didn't realize some EC vendors weren't providing this detection.  Thanks for pointing out.

Larry @ NetApp

To say that EC is another form of RAID is overly simplistic. The fact is, that RAID is all but dead when the conversation is about big data. When we're talking about high drive densities in a large system hosting unstructured data, drive rebuild time is increased to points that just aren't safe. In large systems with lots of drives, the law of averages says that there will always be drive failures, which put RAID sets at significant risk of data loss, all the time, for a long time. EC is actually more-reliable in this respect, IF metadata is adequately dispersed and protected. EC also makes better use of disk capacity than RAID. Sorry NetApp, but in our experience with your storage (we have ~2PB on the floor, 8 years experience), we notice performance degradation once capacity hits about 60%. By 80%, we're adding disk, because performance dips to unacceptable levels. This isn't true with EC object stores. We don't have to hold 20%+ overhead set aside for performance. EC systems (most) are not bound by drive density, either. For example, if a 4TB fails, and 12 TB drives are at some point the latest greatest, users can replace the 4TB drive with a 12TB. RAID doesn't offer that kind of flexibility. Going back to capacity optimization and data protection, EC object storage makes better use of storage capacity, in general. For example, I want to geographically protect 1 PB of data in a RAID-based system, I'll need to replicate my data, resulting in 2X the storage requirement. In an EC system with geo-dispersal technology, I can store and protect 1PB of data in 1.3 - 1.7 PB of addressable storage.

EC is not a buzzword, it's a viable technology. Companies such as Google and Facebook have long-since written off RAID as a viable storage technology for big data. Lots of enterprise vendors and startups alike have recognized the value of EC technology and are deploying 100's if not 1000's of PB of EC systems.

Hi Val,

What are NetApp's plans to implement erasure coding technology? We've done a ton of vendor research over the past 6 months, and NetApp's name hasn't come up in conversation. Just curious where you guys think big data is headed, other than E-series systems and C-Dot.