2014-04-15 12:25 AM
I have a volume is filled with lots of 7zip files which contained MSSQL dump. The volume has about 2000++ 7zip files which is a MSSQL daily dump, each sized 500MB-1.5GB.
Initially I thought deduplication is going to save a lot but it turned out that I was wrong, the saving wasn't much.... is that happening to most zip files? Anyone experienced before?
sis status -l output
Inline Compression: Disabled
Progress: Idle for 08:19:57
Minimum Blocks Shared: 1
Blocks Skipped Sharing: 0
Last Operation State: Success
Last Successful Operation Begin: Tue Apr 15 07:00:00 MYT 2014
Last Successful Operation End: Tue Apr 15 07:02:48 MYT 2014
Last Operation Begin: Tue Apr 15 07:00:00 MYT 2014
Last Operation End: Tue Apr 15 07:02:48 MYT 2014
Last Operation Size: 6978 MB
Last Operation Error: -
Change Log Usage: 0%
Logical Data: 1345 GB/49 TB (3%)
Queued Job: -
Stale Fingerprints: 0%
df -sh output
Filesystem used saved %saved
/vol/myvolume/ 1341GB 3772MB 0%
2014-04-15 12:32 AM
DeDupe is working on WAFL 4K blocks. There's probably not many duplicate blocks in your DB dumps, and less likely that there'd be dupe zipped blocks.
If there were lots of duplicate files that had been zipped up then dedupe would do great things.
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff
Senior Consultant, K&H Research http://www.khresear.ch/
Senior Instructor, Unitek Education http://www.unitek.com/training/netapp/
2014-04-16 07:57 PM
Thanks for the reply. My DB dumps was actually from the same DB. I would assume there will be a lot of duplicates, my guess is the 7zip compression algorithm made the file unique to each other....I don't know but it is good to be aware of this.
2014-05-13 07:50 AM
I just saw this and maybe it's already too late. But maybe it will help someone else.
The way most compression algorithms are working is that even very small changes in the uncompressed file will cause a "cascade" of differences throughout the compressed file. So even though the uncompressed source files might be 99% identical, the compressed files are totally different.
There is an option in newer versions of gzip called --rsyncable (http://superuser.com/questions/636881/what-are-good-compression-algorithms-for-delta-synchronization) which slightly increases the size of the compressed archive, but also syncs the compressed output with the uncompressed input frequently. In this case, the compressed output will remain more similar, even when the uncompressed input has small changes.
Give it a try and let us know if it helps.