Subscribe

Deduplicating snapshots

I have a RHEL5 VM on vSphere 4.1, running Intersystems Caché. All storage is currently in VMDK files on an NFS datastore on a FAS2040 (7.3.5P1). Every night, Caché does a full dump of its database (~120GB) to an independent VMDK located on a volume that does not do snapshots; this dump is subsequently backed up to tape using Backup Exec Remote Agent. Each dump overwrites the previous one. I'm wondering if it's possible to create a volume specifically for those backups, export to to NFS, mount the NFS share directly inside the VM, run the dump, then snap it and have asis deduplicate the data between nightly backups - which doesn't change that much - so that I could keep older backups online in the .snapshot folder, rather than restore from tape when I need one.

Re: Deduplicating snapshots

Hi Boris

Sure you an, I really would not dump into the volume the datastores of your VMware are.

Create a new volume, enable dedupe (sis on),  export it with NFS to the RHEL Host you want to dump from, mount the export - DONE

You will not get very high dedupe rate, beacuse the dumps will be mostly unique blocks.

Hope this helps

Peter

Re: Deduplicating snapshots

I think I'm doing something wrong there, but I can't figure out what. For the time being, I'm testing with Windows and CIFS - I created a 10GB volume, enabled deduplication, shared it via CIFS, and copied a 1GB file there. Manually ran asis on the volume, created a snapshot, then copied the same file there again, overwriting the original - but it's the very same file. Ran asis again, and it didn't find anything to deduplicate, and now I had a total of 20% used space on the volume. Took another snapshot, ran asis - nothing. Copied the file again, ran asis again - still nothing, and 30% used. What's the proper way to have asis deduplicate between current data and a snapshot and/or existing snapshots? Is there one?

Re: Deduplicating snapshots

Hi Boris

When you write "ran asis manually" what was the command and did you "sis on" at the very beginning after creating the volume?

I've followed this process and it worked:

create vol

create cifs share

sis on

sis config -s auto "vol"

copy 1,8G win2k8 iso image to share

copy the same 1,8G win2k8 iso image to share

run "sis start -s "vol""

check with df -hs "vol"

saw the 50% saved...

Hope this helps,

Peter

Re: Deduplicating snapshots

I created the volume from system manager and checked 'enable deduplication' on creation. To start deduplication I ran 'sis start /vol/test_bak' from CLI. I just tried the same thing with the -s switch, but it didn't help - I've got a 1GB snapshot and 1GB data on the volume, and asis can't find anything to deduplicate.

Re: Deduplicating snapshots

Hhhhmmm... Can you copy/paste the output of these CLI commands?

df -hs test_bak

and

df -h test_bak

and

sis stats /vol/test_bak

Also, which file are you copying? You are 100% sure that the file you are copying has duplicate data which CAN be deduplicated?

Peter

Re: Deduplicating snapshots

netapp2> df -hs test_bak
Filesystem                used      saved       %saved
/vol/test_bak/          2053MB        0MB           0%
netapp2> df -h test_bak
Filesystem               total       used      avail capacity  Mounted on
/vol/test_bak/            10GB     2053MB     8186MB      20%  /vol/test_bak/
/vol/test_bak/.snapshot        0MB     1026MB        0MB     ---%  /vol/test_bak/.snapshot
netapp2> sis status /vol/test_bak
Path                           State      Status     Progress
/vol/test_bak                  Enabled    Idle       Idle for 00:18:27

The *file* itself does not have any data to be deduplicated - it's an encrypted archive. However, the snapshot copy and the live copy are exactly the same file - what I need is for the filer to deduplicate the live data against snapshot(s), if that is at all possible.

Re: Deduplicating snapshots

AHA, now I understand...

You cannot dedupe from Snapshot to Active File System.

The data in the Active Files System can be deduplicated (if the content allows it, and you are right, the kind of file you are testing with does not work). The SnapShots are a READ-ONLY copy of the AFS-inode (incl. pointers) and therefore cannot be deduped separately.

All you can get deduped is the data in the AFS, then snapshot the volume and save space in the snapshot as well, beacuse the blocks are already multipointered by the AFS-dedupe process.

I hope this answers your question,

Peter

Re: Deduplicating snapshots

I thought it might be smart enough to compare current data with existing snapshots, but apparently not. A bit of googling gave up options cifs.snapshot_file_folding.enable which seems to do this very thing for CIFS clients - is there an NFS counterpart?

Re: Deduplicating snapshots

From what I know, this only works with NFS 8but maybe an NFS expert can give some davice here).

But also be carefull with this option, it can save space but also negatiely affect the system performance,

because WAFL has much more to do when handling these folded files and their blocks.