Dedupe or Zip

I have a directory on a CIFS share that has 20,000 files, 250 folder totaling 1.9TB of disk space. 

These files are weekly backup copies of data files.  They compress insanely well if zipped (95% according to WinZip) b/c they have a lot of whitespace in them.  So I assume they dedupe extremely well too.

When I look at the properties for this group of folders, it says it's taking up 1.9TB of space.  Is the a non-dedupe representation of how much space these files are taking up?

If I zip the files up, will I gain back some storage capacity or will I lose available space because the new zip files won't dedupe as well as the existing data files that are already deduped?

Re: Dedupe or Zip

Thats a non-dedupe representation of the space used by the files.  Based on what you are saying, it depends on how many weeks of backup copies you keep.  If the files never changed, and they dedupe perfectly, it would take 20 weeks to break even with a 95% compression rate give or take some dedupe yield if the files happen to have identical aligned 4k blocks within them.

Have you considered enabling compression on that volume? If the files compress that well, compression+dedupe may be the winning combination.  Then think about making snapshot copies instead of backup copies.

Re: Dedupe or Zip

Each folder is a different copy of the datasets.  So I have 247 copies of these files.  After spot checking some older folders and comparing them to more recent folders, the datasets are steadily growing in size week after week.  I don't know what's in them. 

Can you tell me what the 1.9TB disk usage number represents?  Is that the non-deduped equivalent amount of disk space these files take up or the actual deduped amount of disk space used?  Either way, this one location that holds these 247 versions of 77 files is taking up 17% of my storage on this volume.

What would enabling compression do to the performance for file access to this CIFS volume?  This is the most heavily used volume amongst our 300 employees.  The daily data churn when I look at the nightly snapvault snapshots is anywhere from 30GB to 180GB.

Re: Dedupe or Zip

Dedupe doesn't change the size of the files.  That number should represent the size of the files, regardless of dedupe.  Space savings from dedupe appears at the volume level.

Enabling compression has two predictable effects.  The CPU load on your controllers will increase, and the disk utilization will decrease.  By how much either way is an "it depends" conversation. Its usually transparent to the end user unless the controllers are overloaded.  As the storage admin, thats a judgement call only you can make.

Since these are backup copies, why not put them in a separate volume?  You could test the storage efficiency from enabling dedupe and compression without impacting your production nas volume.

Re: Dedupe or Zip

I like that idea of giving him another volume with compression and dedupe enabled.  I'll give it a shot when I return from vacation.