ONTAP Discussions

EMC vs NetApp Deduplication - Fact Only Please

wade_pawless
35,458 Views

I'm writing this in hopes to become better educated on the facts, not opinions, about the pros and cons as well as similarities and differences between NetApp's "current" Deduplication technology and EMC's "current Deduplication technology.

I have deployed NetApp solutions within my previous environment, but my current (new workplace) utilizes EMC. On a personal note, I prefer NetApp hands down. However, my responsibility is to define the capabilities and make decisions from the facts.

I have read some about EMC and NetApp fighting over Data Domain around 2009. When talking to EMC, they're deduplication recomenations are Data Domain.

EMC claims that their Data Domain product provide real time deduplication at a 4kb block level which they claim to be much more efficient. EMC mentioned that NetApp does dedupliation at around 128kb. THe bottom line is a claim to better performance capabilities. Thoughts?

Does the curent version o ONTAP utilize SIS or Deduplication?

EMC claims SIS is a "limited form of deduplication". http://www.datadomain.com/resources/faq.html#q5

Please clarify the facts regarding the type of deduplication utilized by NetApp as well as any thoughts to the comments above. Facts only please.

As a side note. I'm currently comparing the NetApp VSeries with EMC produt lines. The goal is to place a single device in front of all SAN's to provide snap shot and deduplication to all SAN data. Over time we'll be bringing multiple storage vendors into our environment. The VSeries is a one stop shop. We can vertualize all data regardless of vendor, provide snap shot capability, dedupe the data, and simplify management. EMC's solutions requires me to purchase two new devices, a EMC VG2 and a EMC Data Domain. Event with EMC's recommendations, there is no capability to snap shot other vendor data. The Data Domain appliance will only provide dedupe for all vendors. The EMC VG2 is being recommend to consolidate multiple file storage servers into CIFS as well as provide NFS for virtualization. So, in the end EMC is saying buy one appliance from us to provide dedupe, buy a second appliance to provide NAS capabilities. Wait a minute, all of these features are built into a single NetApp.... Thoughts?

30 REPLIES 30

wade_pawless
5,905 Views

First, I would like to say that I appreciate all of you taking a moment to reply and share the knowledge with me. Since the posting of this thread I have continued to work with EMC as well as NetApp to further define the architecture requirements to facilitate storage virtualization enabling centralized snap shot, deduplication and storage consolidation while maximizing ROI regardless of back end vendor flavor.

The bottom line:

NetApp- regarding architecture simplification as well as capabilities, NetApp wins. However, cost comparisons have not yet been completed.

In a scenario where you have a primary site and secondary site which will become your DR, then a hot site later, NetApp requires less hardware.

(NetApp’s Solution) To provide snapshot, deduplication, consolidate file storage via CIFS, leverage fast backup library technology (SnapVault), leverage NFS for virtualization storage, as well as enable all of these capabilities on existing SAN’s  and provide all SAN capabilities in a unified architecture, only one appliance appears to be necessary , a VSeries.  If you want all of this data sent offsite, you can place a lower line NetApp FAS such as the 3100 at your DR site. In total we’ve added two appliances as well as some storage to the DR site correct?

(EMC’s Solution) To facilitate CIFS and NFS as stated above, EMC recommends the VG2 which essentially adds a NAS appliance to the architecture. To facilitate deduplication, EMC refers you to their Data Domain product line which adds another appliance to the architecture. Lastly, EMC’s answer to my requirement of leveraging existing other vendor storage to maximize ROI and prevent acquisition of storage that is already in place was…. we’d look into your current EMC capacity and look for areas to expand capacity such as upgrading 1TB drives to 2TB. The bottom line is EMC would require that I purchase more EMC storage to replace other vendor storage. Now, if you want to continue this path into a DR site you’ll be doubling the hardware.

So,

Onsite: 1 VG2 +1 Data Domain+Additional Storage+(EMC SAN already in place)

DR Site: 1 VG2 +1 Data Domain+ New EMC Storage Processor (or head) in NetApp terms + additional storage

Additionally, EMC’s solution will only provide snap shot capabilities. The result is more storage consumption for backups. Since we could not snapshot, we’d be forced to rely on an application level backup management product adding more layers, complexity and costs associated with licensing.

Remember I said facts only please, so let’s list the downside to the VSeries. EMC informed me that storage virtualized through the VSeries has a 30% loss in capacity. I spoke to NetApp whom informed me the number was 20%. Any thoughts or info on this are welcome.

Thanks again, please let me know what you think.

chriszurich
5,140 Views

Here are the NetApp Data ONTAP overhead numbers I'm aware of.

WAFL overhead = 20% (no way to get around this)

Aggregate snap reserve capacity = 5% (can be changed)

Volume snap reserve capacity = 20% (can be changed)

aborzenkov
5,140 Views

WAFL overhead (reserve) is 10%. Where does 20% come from?

chriszurich
5,140 Views

I got the 20% WAFL overhead number from a recent NCDA training I attended. Where are you getting 10% overhead from? Perhaps someone from NetApp can weigh in?

aborzenkov
4,915 Views

WAFL reserve is 10%. It is documented everywhere, see as example kb34044.

chriszurich
4,915 Views

I stand corrected, thanks for the reference KB.

thomas_glodde
4,915 Views

vseries comes with 2 forms of checksumming, either block or zone checksum. blockchecksum is recommended, causes 11% overhead tho, zonechecksum isnt prefered but cases no overhead. can be found in the vseries docs.

srnicholls
4,913 Views

Hi

I recently did an evaluation of a V-Series, and part of it was to attach it to a CX to verify behaviour.  The whole question of how much disk space is 'lost' was of interest to me as the documentation wasnt making sense.

Here are my figures from the evaluation:

1138GB raw CX disk, in raid 5, creates a 976Gb CX LUN (largest you can attach in Ontap 7.3.x), presented to the Vseries as a 854GB raw disk, and 845GB usable after ONTAP formatting, and 721GB after creating an aggregate on it.

And you need one spare array lun on the V series.

Overall the process was pretty easy, and my Vseries already had native disk shelves attached.

Cheers


Steph

aborzenkov
4,913 Views

the documentation wasnt making sense


I wonder which documentation do you mean.

1138GB raw CX disk, in raid 5, creates a 976Gb CX LUN (largest you can attach in Ontap 7.3.x), presented to the Vseries as a 854GB raw disk, and 845GB usable after ONTAP formatting, and 721GB after creating an aggregate on it.


Could you explain what do you mean under "raw disk" and "ONTAP formatting"? 854GB is exactly 87.5% of 976GB, which means you have created BCS (block checksum) LUN. As explained in documentation, there is a second option, ZCS (zone checksum) which does not have this space overhead. 721Gb is approximately 85% of either 854 or 845 which translates into 10% WAFL reserve and 5% default snap reserve. It is hard to tell whether difference is due to decimal/binary mix without seeing actual numbers.

roshan2010
4,913 Views

I would suggest looking at IBM SVC, placing that infront of all your Disk systems. Can provide all of the feature you require as well as reduce the cost of disk options

Public