ONTAP Discussions

Huge CIFS snapshots after dedupe

acistmedical
19,193 Views

Somehow our CIFS snapshots get huge after dedupe.

For example, if i dont run dedpue for a week, my snaps are few hundred mb each, i take them twice a day, noon and midnight.

However, when i run dedupe, it somehow increases the size of a last snap. i just run full dedupe (took 6 h), my last snap was 230mb befrore the dedupe, the same snap is now 70Gb, after the dedupe had run.

Also, if i schedule dedupe to run at 11 and then run snap at 12 the snap is way bigger than if there is no dedupe scheduled at all.

Whats going on??

The same things happens on both controllers.

BTW, im running full dedupe on another CIFS volume now and i can actually see the last snap growing as dedupe progresses. It was 124Mb before i started, then i checked it at 39% of dedupe, it was 21Gb, now at 45% of dedupe  its grew to 26Gb.

what the ???????

25 REPLIES 25

radek_kubka
18,723 Views

Yeah, unfortunately snapshots are not de-dupe-aware (yet)...

So the thing is your snapshot will normally grow by exactly as much as your de-dupe savings, because from ONTAP perspective de-duped block have changed, hence they must be protected by writing the changes to disks

Although not surprising to me, it's arguably very awkward. The only workaround I know of is to run de-dupe first on a volume with no snapshots & then take a snapshot(s). Obviously it doesn't help much if de-dupe is meant to be run regularly, not as a one off.

Anyone more knowledgeable with better current (coming?) workarounds / resolutions?

Regards,
Radek

acistmedical
18,722 Views

So does that mean that dedupe is really usless if using snaps?

Because all the data saved by dedupe will be taken by a large snap?

Also, what is file folding, would any of that help?

i have found this command: options cifs.snapshot_file_folding.enable on     -    what would that do? are there any downsides, issues with using it?

Thanks

radek_kubka
18,723 Views

I am not aware of anything more complete & current than the TR-3505 (a.k.a. De-dupe Bible )

http://communities.netapp.com/docs/DOC-1642

Page 20 says:

NETAPP WHITE PAPER

For deduplication to provide themost benefit when used in conjunction with Snapshot copies, the following bestpractices should be considered:

·         Run deduplication before creating new Snapshot copies.

·         Remove unnecessarySnapshot copies maintained in deduplicated volumes.

·         If possible, reduce the retention time of Snapshot copies maintained in deduplicated volumes.

·         Schedule deduplication only after significant new data has been written to the volume.

·         Configureappropriate reserve space for the Snapshot copies.

·         If the space used by Snapshot copies grows to more than 100%, it will cause df –s to report incorrect results, because some space fromthe active file system is being taken away by Snapshot, and therefore actualsavings from deduplication aren’t reported.

·         If snap reserve is0, you should turn off the Snapshot auto-create schedule (this is the case inmost LUN deployments).

So nothing very surprising or massively helpful I am afraid...

And here is the explanation of file folding:

http://now.netapp.com/NOW/knowledge/docs/ontap/rel732_vs/html/ontap/onlinebk/GUID-8E8AEEB7-4E25-4A78-ACCB-882A91B1D61C.html

File folding describes the process of checking the data in the most recent Snapshot copy, and, if it is identical to the Snapshot copy currently being created, just referencing the previous Snapshot copy instead of taking up disk space writing the same data in the new Snapshot copy.

Without file folding everything would be only worse as multiple snaps would get ballooned, not just one!

Regards,

Radek

acistmedical
17,641 Views

Thanks for the info.

I have turned file folding on, i am also following dedupe best practices, it would be nice if they ever make possible to dedupe snaps.

Thanks

acistmedical
18,723 Views

So what will happen if i delete the large snap that grew because of the dedupe? will the previous one increase in size?

Attached is my snap pic, the 2nd from the top is the snap that increased in size after the dedupe, can i delete it without  making another one grow?

radek_kubka
18,723 Views

So what will happen if i delete the large snap that grew because of the dedupe? will the previous one increase in size?

Yes, this is exactly what will happen. I've tried this the other day in a lab environment & this is one of the most frustrating things I've ever witnessed - you won't get any actual savings, unless you get rid of all snapshots (taken prior to de-dupe scan).

Sweet...

keitha
18,752 Views

The trick as Radek mentioned it to dedupe before snapshotting. If you are snapping daily dedupe daily. If you snapshot hourly, retain a limited number of those type of snaps, the blocks will only be trapped untill the snapshot expires. So in you current model, you just have to be patient until your snapshot schedule rotates through and you no longer have any snaps from before dedupe. From then out try to dedupe prior to any long term snaps being taken.

Keith

acistmedical
18,752 Views

So i take 2 snaps a day, at 12AM and 12PM, and keep them for 7 days, my dedupe is set to AUTO, so it runs very rarely, if it does it creates huge snap.

What would be the best dedupe options then? Every day at 11PM?, an hour before snap? I have tried that, but then the midnight snaps are big.

What about running dedupe manually only, like once every month, and deleting all snaps before i do that, would that be the best solution?

Thanks

mitchells
17,641 Views

Depending on the type of data you have, you may also want to make sure that you have the cifs file folding set to on: options cifs.snapshot_file_folding.enable on.

Thanks,

Mitchell

keitha
17,778 Views

If the midnight snaps are still big running dedupe at 11AM I would run dedupe at 11AM AND 11PM. running it twice a day would mean that it should run very quickly and the snaps should stay small...

radek_kubka
16,696 Views

If the midnight snaps are still big running dedupe at 11AM I would run dedupe at 11AM AND 11PM.

Seems like a feasible approach to me - much better than deleting snapshots. At the end of the day, they are for backup (and, more importantly, restore) purposes, so keeping them may give more benefits than dedupe savings!

And some snaps can't be (easily) deleted, e.g. a baseline for SnapMirror transfer.

acistmedical
17,778 Views

I'll try it and let you know, just set up dedupe to run at 11am and 11pm.

File folding is enabled too.

dwarburton
17,778 Views

How did it go acistmedical? Are things improved now?

acistmedical
17,778 Views

Hey, thanks for reminding me. Completly forgot about it.

Attached are my snap sizes with filefolding on and dedupe run at 11 AM and PM.

Snaps run an hour later.

They are huge, normally they should be below 400mb, and total around few gig.

Im putting dedupe back on Auto and will post new snaps next week.

Greg

dwarburton
17,778 Views

Heya Greg - this is your weekly reminder!

Any updates for us?

acistmedical
16,510 Views

Here you go, so much better. I think this is one of the Major issues that NetApp should address, because in reality it makes CIFS Dedupe completely useless if used with snaps.

radek_kubka
15,399 Views

Hi,

Many thanks for sharing this.

Well, your journey proves that actually dedupe together with snapshots can  work sensibly, but careful approach is required.

To a certain extent this is covered in  the TR-3505: Run deduplication before creating new Snapshot copies

(yes, arguably it could be more descriptive)

Regards,

Radek

dwarburton
16,515 Views

Thanks for your posts and taking the time to upload your results. Although, and this is no surprise recently, I'm a little confused...

Everything I've read says you should de-dupe before snapshotting but your results show that it's best left on Auto - at least with regards to CIFS. Was is possible the ASIS process was still running when snapshots started an hour later?

acistmedical
15,118 Views

No, ASIS does not run during snap, changes are small and it only takes few min to run.

When set to AUTO it doesnt run at all because there is less than 20% changes, thus snaps are small

radek_kubka
16,510 Views

Hi Greg,

I must admit I missed that one earlier & thought your space-efficiency improvement (in the latest results) was due to carefully scheduled de-dupe scans

Im putting dedupe back on Auto and will post new snaps next week. 

What it probably means is that your dedupe scan simply doesn't run because you are not writting more than 20% of new data to the volume - hence snapshots not growing excessively.

Regards,
Radek

Public