2009-03-09 08:35 PM
So who's suffering from ingestion?
Posted in Storage, 9th March 2009 11:00 GMT
Comment: The fastest deduplication on the planet is performed by an 8-node Sun cluster using Falconstor deduplication software, according to a vendor-neutral comparison.
Backup expert W Curtis Preston has compared the deduplication performance of different vendors' products. He uses suppliers' own performance numbers and disregards multi-node deduplication performance if each node has its own individual index.
Preston says that a file stored on one with no previous history would not be deduplicated against the same file stored on other deduplication products in the same set of systems, because each one is blind to what the others store.
A Data Domain array is an example of an array of deduplication systems that do not share a global index. Preston says: "NetApp, Quantum, EMC & Dell, (also) have only local dedupe... Diligent, Falconstor, and Sepaton all have multi-node/global deduplication."
Nodes in a 5-node Sepaton deduplication array, for example, share a global index and the nodes co-operate to increase the deduplication ratio. In this situation a multi-node deduplication setup acts as a single, global deduplication system.
Preston compares the rated speeds for an 8-hour backup window, looking at the data ingest rate and the deduplication rate. As some vendors deduplicate inline, at data ingest time, and others deduplicate after data ingestion, known as post-process, these two numbers may well differ.
He compared deduplication speeds from EMC (Disk Library), Data Domain, FalconStor/Sun, IBM/Diligent, NetApp, Quantum/Dell and Sepaton/HP. (HP OEMs the Sepaton product.)
The Falconstor/Sun combo topped the ingest scores at 11,000MB/sec using an 8-node cluster and Fibre Channel drives. It was followed by Sepaton/HP with 3,000MB/sec and then EMC with 1,100MB/sec. Quantum/Dell ingested at 800MB/sec with deduplication deferred to post-process and not run inline.
NetApp was the slowest, ingesting data at 600MB/sec. The configuration was a 2-node one but each node deduplicated data on its own. Quantum/Dell would ingest at 500MB/sec if deduplication was inline
The fastest deduplication engine was the Falconstor/Sun one, rated at 3,200MB/sec. It was followed by Sepaton/HP at 1,500MB/sec, then by IBM/Diligent at 900MB/sec, Data Domain at 750MB/sec with EMC trailing at 400MB/sec. Preston couldn't find any NetApp deduplication speed numbers.
Preston also looked at the numbers for a 12-hour backup window. If vendors have an ingest rate that is more than twice their deduplication rate, they would need more than 24 hours to ingest and then deduplicate 12 hours worth of ingested data. This means their effective ingest rate for a 12-hour backup run can only be twice their deduplication rate.
He also has a discussion of restore speeds for deduplicated data, known as inflation or rehydration. The sources for his numbers and the products used are listed on his blog.
This is the first comprehensive and vendor-neutral deduplication speed comparison, and is well worth a look. ®
2009-03-09 08:36 PM
Performance Comparison of Deduped Disk Vendors
|Written by W. Curtis Preston|
|Thursday, 05 March 2009|
This blog entry is, to my knowledge, the first article or blog entry to compare all these numbers side by side. I decided to do this comparison while writing my response to Scott Waterhouse's post about how wonderful the 3DL 4000 is, but then I realized that this part was enough that it should be in a separate post. Click Read More to see a table that compares backup and dedupe performance of the various dedupe products.
What Data Domain is doing, however, is shipping the DDX “array,” which is nothing but markitecture. It is 16 DDX controllers in the same rack. They refer to this as an “array” or “an appliance” which can do 42 TB/hr, but it is neither an array nor an appliance. It is 16 separate appliances stacked on top of each other. It’s only an array in the general sense, as in “look at this array of pretty flowers.” I have harped on this “array” since the day it came out and will continue to do so until Data Domain comes out with a version of their OS that supports global deduplication. Therefore, I do not include this "array's" performance in the table at the end of this blog article.
In contrast, Diligent, Falconstor, and SEPATON all have multi-node/global deduplication. Diligent supports two nodes, Falconstor eight, and SEPATON five. So when Diligent says they have “a deduplication appliance” that dedupes 900 MB/s with two nodes, or SEPATON says their VTL can dedupe 1500 MB/s with five nodes, or Falconstor says they can dedupe 3200 MB/s with eight nodes, I agree with those statements – because all data is compared to all data regardless of which node/head it was sent to. (I'm not saying I've verified their numbers; I'm just saying that I agree that they can add the performance of their boxes together like that if they have global dedupe.)
NetApp, Quantum, EMC & Dell, have only local dedupe. That is, each engine will only know about data sent to that engine; if you back up the same database or filesystem to two different engines, it will store the data twice. (Systems with global dedupe would store the data only once.) I therefore do not refer to two dedupe engines from any of these companies as “an appliance.” I don’t care if they’re in the same rack or managed via a single interface, they’re two different boxes as far as dedupe is concerned.
For the most part, I used numbers that were published on the company's website. In the case of EMC, I used an employee (although unofficial) blog. Then I applied some math to standardize the numbers. In a few cases, I have also used numbers supplied to me via an RFI that I sent to vendors. If the vendor had global/multi-node/clustered dedupe, then I gave the throughput number for their maximum supported configuration. But if they don’t have global dedupe, then I give the number for one head only, regardless of how many heads they may put in a box and call it “an appliance.”
For EMC, I used the comparison numbers found on this web page. EMC declined to answer the performance questions of my RFI, and they haven't officially published dedupe speeds, so I had to use the performance numbers published this blog entryon Scott Waterhouse's blog for dedupe speed. He says that each dedupe engine can dedupe at 1.5 TB/hr. The 4106 is one Falconstor-based engine on the front and one Quantum-based dedupe engine on the back. The 4206 and the 4406 have two of each, but each Falconstor-based VTL engine and each Quantum-based dedupe engine is its own entity and they do not share dedupe knowledge. I therefore divided the numbers for the 4206 and the 4406 in half. The 4406’s 2200 MB/s divided by two is the same as the 4106 at 1100 MB/s. (The 4206, by that math, is slower.) And 1.5 TB/hr of dedupe speed translates into 400 MB/s.
For Falconstor, I used this data sheet where they state that each node can back up data at 1500 MB/s, or 5 TB per hour, and that they support 8 nodes in a deduped cluster. They have not published dedupe speed numbers, but they did respond to my RFI. They said that each node could do 250 MB/s if you were using SATA drives, and 400 MB/s if you were using Fibre Channel drives. I used the fastest number and noted in the table that it required FC drives. (That will certainly affect cost.)
IBM/Diligent says here that they can do 450 MB/s per node, and they support a two-node cluster. They are also an inline box, so their ingest and dedupe rates will be the same. One important thing to note is that IBM/Diligent requires FC disks to get these numbers. They do not publish SATA-based numbers. That makes me wonder about all these XIV-based configs that people are looking at and what performance they're likely to get.
Quantum publishes this data sheet that says they can do 3.2 TB/hr in fully deferred mode and 1.8 TB/hr in adaptive mode. (Deferred mode is where you delay dedupe until all backups are done, and adaptive dedupe runs while backups are coming in.) I used the 3.2 TB/hr for the ingest speed and the 1.8 TB/hr for the dedupe speed, which translates into 880 and 500 MB/s, respectively.
Finally, with SEPATON, I used this data sheet where they say that each node has a minimum speed of 600 MB/s, and this data sheet where they say that each dedupe node can do 25 TB/day, or 1.1 TB/hr, or 300 MB/s. Since they support up to 5 nodes in the same dedupe domain, I multiplied that times 5 to get 3000 MB/s of ingest and 1500 MB/s of dedupe speed.
Backup & dedupe rates for an 8-hour backup window
2009-03-10 02:22 PM
As I said over at Curtis' blog, this is a really great resource, although there are two concerns that I have.
First, all the numbers are vendor-published. Deduplication is particularly prone to the "best case scenario" problem -- many of these vendors can dedupe endless streams of zeroes (or any other repeating pattern) much more quickly than random data!
Second, dedupe is for more than just backup. Dedupe is a great way of cutting costs in VTL and D2D backup, but also integrating deduplication much earlier in the data lifecycle can cut costs significantly. Performance and functionality requirements are very different in dedupe for backup vs. dedupe for archive, and so none of the products listed really serve the archive market well.
Right now data is written to tier 1 primary storage at a cost of up to $30 to $50/GB, and then backed up at an aggregate total cost of several dollars more. Much of this data can be moved much earlier to an archive primary storage tier at $3/GB or less, and an effective cost even lower with deduplication. Replication can reduce or eliminate the need for backup of this tier entirely. When you're talking about petabytes of data, you can't always afford to be down for the restore period. With economic pressures, any business would be remiss not to look at deploying an effective archive tier.
At Permabit we developed our Enterprise Archive product specifically to serve these needs, and believe we have developed the only truly scalable deduplication solution for archive data, while also providing levels of data protection far beyond what is available with RAID. I talk a little more about the underlying economics over at my blog in the article at http://blog.permabit.com/?p=77.
2009-03-15 09:57 PM
I'm not entirely sure that hawking your products on another vendors community forums is what I would consider to be "good form", especially after having called NetApp "Slimy" in comments on a news article.
In any case, most of the economics of arcihving dont really apply to NetApp plaftorms. I wrote a blog post about that around a month ago which you can find here