snapvault - am I making this too hard?

JohnW · ‎2015-08-31

Snapvault seems rather complex to accomplish properly since NetApp does not perform Inline deduplication.

Here are the steps it seems like I need to script since there's not a good way to do it with the existing tools.

1. Run dedupe on the source volume

2. Check and wait for it to complete

3. Once it is complete take a snapshot and label it

4. Invoke snapmirror to update and replicate to the secondary storage at DR Site

5. Check and wait for snapmirror to complete

6. Once snapmirror is complete start the vault on the secondary site to vault to an archive array.

7. Either after or before step 6 remove the manual snapshot I took with the script.

Since NetApp doesn't do inline dedupe then to achieve maximum savings in the vault and make the network transfer as efficient as possible it seems these steps are required. You could *try* to set up all of these schedules in dataontap and just guess that "well dedupe will probably be done by X time" but it is not very effiecient or reliable.

By the way I have all of this scripted already...just wondering if there's a better way or I'm missing something...What I'm trying to prevent is needlessly transfering data that is non-deduplicated, and I also don't want to put a bunch of blown up snapshots in my vault because dedupe ran after the vault snap was taken.

It seems this could also be more efficient if NetApp's snapshots didn't lock all the blocks but could be deduplicated as well.

paulstringfellow · ‎2015-09-02

couple of questions John.

firstly what are you trying to achieve with this... I assume it's to reduce the backup footprint as much as possible?

secondly - is there a reason for mirroring then vaulting and not just vaulting from source to target?

you are right about inline dedupe - although that is changing, but will depend on what verson of OnTap etc you are running.

what you could do though is inline compress, probably give you a better space saving than dedupe anyway - on the NetApp support site there is some useful info about how snapvault works with other OnTap technologies, it's probalby worth looking up...

one little snippet though is this...

You should be aware of the following best practices:

You should run postprocess compression to compress existing data on the primary system before running the baseline transfers for SnapVault.
You should use inline compression to achieve data compression savings on a SnapVault destination volume without affecting the Snapshot space.
You can independently configure the deduplication and data compression schedules on a source volume, because the schedules are not tied to the SnapVault update schedule.
However, for a destination volume the deduplication and data compression schedules are tied to the SnapVault update schedule

don't know if that helps or gives you a baseline for reviewing the process you are looking at, you're right you want to avoid scripts as much as you can...

regards

Paul.

JohnW · ‎2015-09-02

Thank you Paul. Here are the answers to your questions and some of my own.

"firstly what are you trying to achieve with this... I assume it's to reduce the backup footprint as much as possible?"

Yes. I'd like to reduce the footprint and eliminate the possibility of having a snapvault snapshot that is stuck in the vault for our entire retention period that is blown up in size because of dedupe. (really need netapp to dedupe the snapshots...)

"secondly - is there a reason for mirroring then vaulting and not just vaulting from source to target?"

Yes. We have a decent amount of data that we replicate to the mirror location. The mirror is used for SRM/hot copy. Instead of transferring everything once again over the WAN we are going to vault from the mirror. We keep a few copies of hot data and then vault the rest off for our retention period.

"you are right about inline dedupe - although that is changing, but will depend on what verson of OnTap etc you are running."
We are running 8.3P2. In 8.3.1RC they make it sound like it's okay to run inline compression all the time on AFF and FlashPools but I haven't heard anything yet on inline dedupe...or global (aggregate based) dedupe.

"what you could do though is inline compress, probably give you a better space saving than dedupe anyway - on the NetApp support site there is some useful info about how snapvault works with other OnTap technologies, it's probalby worth looking up..."

For daily backups/vaults I would expect dedupe to get us better savings than compression. Are you suggesting that we should enable the dedupe/compression on the vault with it's own schedule? It's my understanding that doing this rehydrates the data from the source when it lands in the vault....this isn't a huge issue since the vaulting would be taking place over 10Gbe...I'm not sure how that works with the snapshots though that get transferred after the baseline in terms of how they are rehydrated. We run deduplication on most of our source volumes but not compression.

"You should be aware of the following best practices:

You should run postprocess compression to compress existing data on the primary system before running the baseline transfers for SnapVault.
You should use inline compression to achieve data compression savings on a SnapVault destination volume without affecting the Snapshot space.
You can independently configure the deduplication and data compression schedules on a source volume, because the schedules are not tied to the SnapVault update schedule.
However, for a destination volume the deduplication and data compression schedules are tied to the SnapVault update schedule"

I'm curious about this. Should we enable inline compression always on snapvault destinations? Or just leave them with no efficiency and take on the source volumes? Or have their own? That part/practice is a bit confusing to me. I'm not sure what will end up being the best.