Subscribe

Question on Deduplication

I took a 5TB volume of data.  I deduplicated the data and achieved a 31% space savings.  When the customer mounts a CIFS share at the root of the volume from a Windows client it shows the volume is 5TB is size and has 5.8GB left of space.  I know the NetApp is reporting block usage and Windows reports on file/folder usage.  But how does this benefit us?  The Windows clients still think the volume is almost full when it really isn't.  Thoughts/Suggestion?

Re: Question on Deduplication

Are the blocks that were deduped and "removed" still stuck in the snapshots, and are they bigger than the snap reserve (thus encroaching on the filesystem)?

My deduped volumes report the space after dedupe to the clients, which is what we'd expect.

Bill

Re: Question on Deduplication

Hi and welcome to the Community!

I deduplicated the data and achieved a 31% space savings.

That's cool, but did you have any existing snapshots by any chance? They have a nasty habit of growing whilst dedupe is running:

http://communities.netapp.com/message/22012#22012

Normally, if the space is 'genuinely' saved (i.e. blocks are deduped, but no snapshots are eating just released space), clients should see more available capacity in a file share immediately.

Regards,
Radek

Re: Question on Deduplication

I turned off quotas on the volume and windows reported the space savings correctly. Yes!!  I then turned back on quotas and it is initializing and scanning all of the data again to see if turning it back on again would put me back to square one or not.

Re: Question on Deduplication

After much research, quotas and deduplication don't mix well if end users want to see their space savings.  I turned back on quotas and the problem showed up again.  

Pulled from a Technical Report from NetApp on Deduplication:
http://media.netapp.com/documents/tr-3505.pdf

6.4.1

QUOTAS
For deduplicated files, the logical (undeduplicated) size is charged against the quotas. There are several advantages to this scheme as opposed to charging quotas based on the physical (deduplicated) size of the file:

This is in line with the general design principle of making deduplication transparent to the end user.

It is easier for system administrators to manage quotas. They can maintain a single quota policy across all volumes, whether or not deduplication is enabled on it.
There are no out-of-space failures when data is being sent using SnapMirror from a volume with deduplication enabled to a destination volume that has deduplication disabled.

Re: Question on Deduplication

Actualy, it works as I wish it to do. I was wondering the other day if quota counts the logical "real" or deduped/compressed space.

It's much more difficult do explain to the user "your files might take upp 10GB or maybe 7,5GB or if you are lucky just 2,5GB of your quota".

This way the users can easily understand that 10GB of their data is going to take 10GB on the Filer.

Any savings done in the background, either by dedupe or compression, is only going to be visible to the system manager/owner.

You should rethink if you use the quota for what : as a safety limit or as a way of selling space.

Also, for a group of users you could use simple volumes as a kind of space-limitation instead of quotas. That way they would see the available volume space all the time.