Subscribe

Deduplication Schedule

Hello,

I have a question regarding Data Ontap Deduplication Schedule.

It is best practice to schedule deduplication each night or once a week ? On a FAS2040.

I have 5 volumes to dedup.

I also have an SMVI job which runs every two hours with a retention of 5 days on each  volume.

I have noted (but I am not sure) that when i dont dedup all days SMVI snapshots size are smaller.

What is the best practice in my consfiguration ?

Thank you in advance.

Yannick.

Re: Deduplication Schedule

You are right, deduping will cause the snapshots to be larger. That however is because dedupe is releasing blocks from the live filesystem which are trapped in a snapshot. All that means is you don't get the dedupe savings till the snapshot ages off, 5 days later. Thats OK because you still get the savings!

In a perfect world I recommend you dedupe before you snapshot, but deduping every 2 hours may not be practical so you might want to stick with once per day. Maybe twice per day since you snap so regularly. The more often you do it, the fewer blocks that will become stuck in the snaps.

I would also try to now dedupe all the volumes at the same time. Stagger them by an hour. This will insure the dedupe process will not impact the performance of the system.

Keith

Re: Deduplication Schedule

Hello Keith,

Thanks for the reply.

And the option of doing dedup once a week isn't good in this scenario ?

I think I will do like you said, and launch a dedup once a day with one hour between each job.

I do a SMVI backup so often because it updates the snapmirror relationship to my secondary DR site.

Have a good day.

Yannick.

Re: Deduplication Schedule

Deduping once a week is fine, it just means that more blocks are stuck in the snaps till they expire. Deduping more often will also reduce the amount of traffic on your wan since SnapMirror is Dedupe aware and thus deduped blocks are not moved over the wire. If you deploy new VMs I would kick off a customer dedupe run on that volume before the SnapMirror/SMVI job to minimize the transfer. Make sense?

Lots of SMVI jobs is not a bad thing. I wish more customers would schedule SMVI more often. Makes bringing back dead VMs MUCH easier...

Keith

Re: Deduplication Schedule

I have a 100Mb link between the two sites. And can use 60% for SMVI replication all day long.

When i dedup all days snapshots are 2 to 3 times biger so i will lost a lot of place ? since when it dedup all 12 snapshots / 24 hours are growing...

And when i do it once a week, if i have understand your explanation, jobs would be longer.

So I am a little lost.. what is the best choice ?

Yannick.

Re: Deduplication Schedule

The bigger snapshots are a bit of an illusion. Remember the snapshot size is just the number of blocks that are unique only to the the snapshot. As you change or delete blocks from the active filesystem then the snapshot "grows". In your case, the number of blocks being transfer or used on the storage is exactly the same in either case however when you deduplicate that "deletes" blocks from the active filesystem which makes the snapshots "grow" because now more blocks are being used by only the snapshot. It does not effect how much storage you are using. You are NOT using more storage by deduplicating.

Lets use and example. Say you have 100GB of VMs and take a snapshot. at time zero that snapshot is 0MB since it and the active file system all look at the same blocks. No lets say an hour goes by and you have changed 10GB of data on the VMs. Not the VMs will appear to still be 100GB in size but the snapshot now appears to be 10GB in size. together they consume 110GB of space. So far so good?

Now lets dedupe the VMs. Lets say that saves us 50GB of the VMs (50% dedupe) That would now "delete) 50 GB of the blocks in the running VMs. However we need those blocks for the Snapshot (remember the snapshots are read ONLY) so now the running VMs are 50GB and the Snapshot is 60GB (10GB of changes and 50 GB due to dedupe). The combination is still 110 GB in size.

So why bother? Now what happens when the snapshot is deleted? Now the total space consumed becomes 50GB and you get the benefit of dedupe. So it's like a delay on the dedupe savings...

Keith

Re: Deduplication Schedule

Thanks for the explanation

So once per day seems to be the better choice ?

Yannick.

Re: Deduplication Schedule

Yep

Re: Deduplication Schedule

Ok I will do so.

Thank you