ONTAP Discussions

Most unusual application of dedupe?

friea
6,783 Views

lfreeman mentioned that people are using deduplication in ways NetApp didn't originally expect. What's the most unusual application of dedupe anybody's aware of?

8 REPLIES 8

reinoud7
6,784 Views

Hi Friea,

It's not realy unusual, but here is the list of applications where we use today dedupe. Let me know if you find those applications unusual!

  • dumps of VMWare

  • dumps of SQL and Sybase

  • our share with the sources of all our software

  • medical application in genetics

  • VMWare production

We will expand this list the next months.

Reinoud

rkaramchedu1
6,784 Views

Well

Having supported the life sciences community for a few years, when I first heard of de-dupe, I wanted to see if we can use de-dupe algorithms to seek out the commonalities between multiple genomic data sets between organisms, species etc...However, since NetApp de-dupe works on a WAFL block, compares are restricted to 4KB. If that would be configurable, the resulting scenarios would be interesting at best..

that would be a good test for netapp admins currently supporting such data sets.

lfreeman
6,784 Views

Hi rkaramchedu1, true we are bound by the 4K WAFL blocks. Did you see any saving at all on your genomic data sets?

lfreeman
6,784 Views

Hi reinoud7 - Can you give us an idea of the space savings (%) you are seeing on each of the applications you are deduping? And yes I'd say that the shares with your software source code and the genetics data are "unusual" - good stuff!

reinoud7
6,784 Views

Of course, no problem:

  • dumps of VMWare: today, it's just a VCB kind of backup: 49 % of savings

  • dumps of SQL and Sybase : 66 - 68 % of savings (7 full dumps of the same database)

  • our share with the sources of all our software : only 28 %

  • medical application in genetics : is till testing, but here we only have a saving 8 %

  • VMWare production : still in test, more details later but at least 50%

I was forgotten this one: all our invoices, send to our patients (more than one million / year): 52 % (this are pdf-files)

Greetings, Reinoud

friea
6,784 Views

Hey there ... just noticed you have 99 posts!! Congrats at nearly being into 3 digits ...

__M_Marotti_3892
6,783 Views

Talking about unusual application, is there anyone that have been tested SAS institute applications data using dedupe? I want to know the savings.

Thanks,

lfreeman
6,783 Views

Hi M_Marotti - I guess by the sound of the crickets in the background no one has tested SAS Institute data. In a case like this, I've seen people take two approaches:

1) Run the Space Savings Estimation Tool against a sample dataset. This tool will simulate dedupe and is available from your NetApp or authorized VAR SE.

2) SnapMirror a copy of the SAS data to a test/dev volume and run dedupe against that volume

Either of those approaches will help give you an idea of the space savings you'll see.

Hope that helps...

Dr Dedupe

Public