Subscribe

Deduplication of SQL Server Backup Files

Hi everyone,

Does anyone have any good experience of de-duplicating SQL Server backups? We have a volume that holds .BAK backup files for a number of databases, and around a week's worth of daily backup files for each database are held. Since the data churn is low, I thought I would get reasonable success de-duping this volume, but having turned it on and run through, I got less than 1%.

I don't know the mechanisms of SQL Server backups particular well - maybe the backup algorithms and data compression mean that similar blocks is actually quite unlikely ..

Just wondering if anyone has had similar experiences or successes where I have not?

Cheers,

Jon

Re: Deduplication of SQL Server Backup Files

Hi,

SQL backups have compression enabled by default (if memory serves right) & that may be the reason why each time backup of almost exactly the same database is chopped differently into blocks, hence poor dedupe savings.

I'd actually ask another question: why bother with SQL dumps in first place? Why not use SnapManager for SQL instead, which will do exactly the same (create full logical copies via snapshots) with much less disk space used (delta changes only)?

Regards,

Radek

Re: Deduplication of SQL Server Backup Files

Thanks for the reply, Radek!

The point about compression makes sense. SnapManager for SQL is indeed an option that we can look at, but we were first of all looking at de-dupe across the board. However, given these results, I think we might change priorities and look at getting SnapManager for SQL in sooner.

Thanks again,

Jon

Re: Deduplication of SQL Server Backup Files

In my experience dedup with backup is not the best combination at all.

Often backup is done to SATA disks and within a 24 hour schedule you want to first move te backup to disk, then copy the disk backup to tape and after that start a dedupe.

Then (on 2050's en 3020's) the dedupe will still be running when the new backup cycle starts, resulting in a very slow backup speed.

As change rates on production data is not as high as on backup volumes, you need totally different sizing of the filer.
For backup streams that are not full's (or incremental forever) (as in Snapvault) dedupe does it thing, otherwise a inline dedupe capable device or VTL will be more apropriate.

Re: Deduplication of SQL Server Backup Files

Thanks both for your replies. I agree that the way we are handling SQL backups at the moment is not great and we are actively looking at changing that.

FWIW, I ended up getting circa 25% de-dupe on this SQL backup data, which is probably about what I expected in the first place .....

Re: Deduplication of SQL Server Backup Files

Are you doing anything different today? 

We are still dumping flat file SQL backups to a CIFS share.   We only keep our backups for 3 days so dedupe doesn't really sound like a huge win.  For us it's cheaper to send those backups to tape via unc backup to a proxy server. 

We are entertaining storing everything on the filer by eliminating daily fulls and replacing them with differencial backups ... one full two diff and so on. 

We are not interested in using snap manager unless it can provide a DB backup that is usable fpr test and qual environments.  We use our prod backups to refresh our test, dev and qual platforms.  Our app dev folks do 10-50 restores a day as some developers are refreshing dev environments mulitiple times a day.