2008-10-24 05:08 PM
Soon we will be using the CIFS functionality of our NetApp 3140, which prompted me to do some testing with deduplication and snapshots. It seems as though any snapshots are not deduplicated. As a POC, you can do the following.
Create a 5G CIFS volume.
Copy a 1G file to CIFS volume.
4G left on CIFS volume.
Take a snapshot.
Delete 1G file
Still 4G left on CIFS volume because data is still in snapshot, which is to be expected.
Copy SAME 1G file to CIFS volume.
Run dedupe process
Now only 3G available!
I can loop this process until there's literally no space left despite the fact that I'm using the same 1G file over and over again. Now obviously if I copy the same 1G file to the volume twice before any snapshots, and THEN run deduplication, I will immediately get a 50% space savings. The point here is that the deduplication process doesn't dedupe snapshots, only primary data.
I would have expected the dedupe process to find duplicated data between all the snapshots, as well as duplicated data between the snapshots and primary data. By this design it seems necessary to dedupe before you take a snapshot. Even then, you will only dedupe primary data.
Is there something I don't understand or a technical reason why NetApp is not taking advantage of duplicating the snapshots?
2008-10-26 08:02 AM
You might want to look at enbling "File Folding" which is the process of checking the data in the most recent snapshot copy, and, if it is identical to the Snapshot copy currently being created, just referencing the previous Snapshot copy instead of taking up space by writing the same data twice. The command to enable this is...
options cifs.snapshot_file_folding.enable on
2008-10-27 11:33 AM
Well I know that with the 7.3 software the pointers are in the aggregate and not the volume. However, we are running the 7.3 software and still not deduplicating the snapshots, so my guess is that Netapp is gearing up for that hopefully in the next release. It's a pretty big issue for us, because we're planning on keeping snapshots for a long time. File Folding was helpful, thanks.
2008-12-08 07:39 AM
Glad to hear file folding was useful :-)
On the dedupe front, the trick is really to ensure deduplication runs are completed before any "long term" Snapshot copies are kept.
In 7.3, there is tighter integration with SnapVault (secondaries), such that if deduplication is on for the secondary, then a deduplication is run after each transfer, then some clever snapshot "auto-jiggery-pokery" happens to ensure the resulting SnapVault Snapshot (backup) is the (hopefully reduced!) data after this deduplication :-)
FYI, Snapshots are NOT deduplicated in 7.3.
2009-01-05 01:12 PM
The thing to keep in mind is that snapshots are always read only. Even the "writeable snapshots", which really are a snapshot + a difference file.
Make sure ASIS has finished (sis status) before creating a snapshot on a volume.