2008-08-21 07:19 AM
Let's say for example that I am running a file-level backup of a volume that has assorted Excel and Word files upon it and A-SIS (dedupe) is enabled. It would seem to me that the backup is only going to be backing up the deduped files and that if I try to restore one of these files at a later date, it will fail due to it only being a partial file that was backed up originally.
Perhaps my mind isn't comprehending the logic behind dedupe but if someone could share some insight into this, I would truly appreciate it.
2008-08-28 03:17 PM
Files aren't de-dup'd - WAFL blocks are; therefore your backups won't be affected in the slightest.
The beauty of de-dup is that anyone/thing accessing the data (for access or backup purposes) will see exactly what they saw before the deduplication but on the file system less blocks will be stored.
2008-08-28 03:47 PM
You present what is often a common misunderstanding of how deduplication works, and put it into an excellent use-case scenario to be explained - I applaud you for that!
As has been mentioned, data is deduplicated at the Block level, so the effects are hidden when you look at data from a file level context.
Example being: If you look at two files sitting on a volume, a 1mb pdf and a deduplicated 1mb pdf - The system will show them as taking up 2mb total.
Whereas looking at it from a volume perspective, only 1mb in total would be used.
What this means when you go to perform a backup (Traditional backup, robocopy, copy, etc) is that each file will be entirely read beginning-to-end, taking up 2mb of data on your destination.
Ways to work around this would be to take advantage of snapshots (Block level), Block based backups (NDMP) and Blocked based mirrors (VSM Snapmirror)
So to address your question about potentially partial files - If treated like files, you will get all of the data you had accessed originally (no partial access)
And even the retreiving of the data at the block level, you will also have full access to the data as pointers are still referencing live data - Only taking up less space.
Hopefully this helps breach the surface of how deduplication plays in the big picture of block data vs file data.
Look forward to more great questions like this from you Brian!
2008-08-29 07:01 PM
Thank you too for your kind words Brian, I appreciate it and I'm glad I could help.
Further on the issue of Deduplication, you may want to invest some time reading some of the associated whitepapers and Technical Reference papers on the subject so you can see many of the benefits, Best Practices and other general information it provides.
If you haven't looked at some of the benefits dedup can provide in your particular organization (be it VMware, File servers or other data) I'd definitely suggest checking into where you may begin to realize further benefits.
Page 42 covers Deduplication and Thin provisioning with VMware
Hopefully this additional information helps ensure your deduplication experience is an even more promising one!
Thanks and take care!