ONTAP Discussions

Am I Understanding This Correctly? (SnapVault)

johnlockie
13,783 Views

It appears that all SnapVault is doing is copying (bit for bit) my primary data to a secondary filer.

Once I copy it I snap it using a traditional snapshot on the secondary.

While snapmirror is a bit for bit copy also, but it also copies snapshot volumes and cannot have it's own snapshot copies which are stored long term....

So if I had a volume I wanted to have quarterly backups of, and keep those quarterly backups indefinitely I would:

1. Create snapvault copy from primary to secondary

2. Take a snapshot on the secondary

3. After 90 days, update the snapvault volume (essentially replicating all the changed bits over those 90 days, which is fine because I have snap'd the original version in step 2)

4. Take another snapshot

5. Repeat...and continually repeat (never deleting the 90 day snapshots on secondary filer)....for whatever retention period the business requires

I would now have snapshots 90 days apart on the secondary filer, but on the primary I would have the more usual handful of shorter term snapshots....

Is this correct?  If so, how is this considered a viable archiving solution?  I would need to mount the snapshots to view archived data eh?

I think I am misunderstanding something at the 10,000 foot level here.

John

16 REPLIES 16

rmharwood
13,722 Views

Sounds about right to me. In what way do you consider this not to be a viable solution?

A 90-day backup period is non-standard unless you use Protection Manager, but it can also be scripted fairly easily.

You can present the secondary data to a host for reading or you can do a "snapvault restore" operation to put the entire qtree back on the primary.

Richard

johnlockie
13,722 Views

To be frank....

There are a couple reasons I don't see this as viable (for the cost).  First, I do understand the lack of support for 90 days, etc.  Why create a "vault" solution that restricts you to weekly or hourly backups?  Seems monthly would be good here.  Now it makes sense to me that when I am snapvaulting the data from primary to secondary that I do it as often as I can, but why restrict the snapvault snap sched on the secondary device to these hourly and daily times?  Also, the process seems archaic.  I like snapmirror because everything is done on secondary, which pulls data from primary filers.  Now, why I cannot setup a snapvault schedule much like we do in snapmirror.conf is beyond me.  And why the secondary filer is not designed to say "oh...you are asking me to take a snapshot of a snapvault volume....let me pull the most current data and THEN snapshot it".  Like you said, most of this can be scripted - but c'mon....how many thousands do I pay per license? ;-).  If I am correct, then this is a huge feature flaw, or it's an excuse to push protection manager....

Secondly, if my summary is correct then it's not doing anything incredibly special here.  It's creating a standard bit for bit copy of my volume that is read and write capable (as compared to restricted snapmirror volumes), which simply means....I can take snapshots!   OK great....so why is this thousands of dollars again!?!?  I already have the license to create snapshots on volumes, I have the license to create these volumes, oh wait....I need a license to copy a volume from a primary to secondary device wherein I can then snapshot it and if I want to create a new snapshot with an updated copy I need to move changes from primary to secondary (read: copy).

I might reach out to one of the NetApp SE's I know and ask for some interpretation (the "marchitecture" for snapvault is some of the worst I have seen....and maybe that's because it's not as powerful as they want us to think).  Because the reseller sales guy I deal with has no idea how this actually works, and what the restriction are.  I am running on a trial license for my primary heads.....sigh, just trying to decide if I should pull the trigger on the expense.

jmerrill
13,724 Views

John,

Overall, I believe your data flow is correct.  I think some of things you may be over simplifying.  I'm sorry if your reseller isn't accurately representing the product - SnapVault is a very powerful product.

I think the first thing to do is to highlight the differences between SnapMirror and SnapVault (at a VERY high level).

Volume SnapMirror has the ability to Failover/Failback, and also replicates at the volume level - if you have a volume with 10 snapshots, your destination will have the exact same 10 snapshots.

SnapVault brings in the long term retention and allows for "broken" retention (10 snapshots on the source, and say 90 on the destination).  In addition, it works at the sub-volume (qtree) level - which provides you with flexibility for your backup design.

Diving a little more into SnapVault.   SV allows you to keep different # of snapshots on the source and destination - this could mean keeping 24 hourly and maybe 4-5 daily on the source while you may only transfer an hourly every 6 hours and keep just 2-3 and then keep your daily for 30 days and your weekly snapshots for 3 months.  This gives you the ability to keep snapshots for a longer period, most likely on cheaper storage also.

SnapVault provides the ability for you to set the retention and then SV will roll the snapshots based on your settings.  In addition, you can pick/choose the snapshots you want to replicate (in the example above, maybe you don't want to replicate any hourly snapshots).  All the replications are also done similar to SnapMirror where just the changed blocks are transferred and written to disk.  SnapVault provides you the ability to implement a true incrementals forever technology - no more weekly fulls and daily incrementals.

SnapVault working at the qtree level is benefit for consolidation, you could take multiple qtrees from different volumes (on the same or different systems) and replicate those to 1 destination volume.  Assuming you are running a version of ONTAP 7.3, you can also enable deduplication on the SV Secondary volume - further reducing your footprint on Secondary storage.

If you wanted to take a snapshot every 90 days, this would require a script.  Within ONTAP we can only create hourly, nightly and weekly snapshots.  If you implement Protection Manager, Monthly snapshots are also an option.  Also, the data in the SnapVault destination volume is in fact read-only - if you needed R/W access, a FlexClone would be the best solution for that.

Hopefully this helps a little.  If you need further discussion, please let me know.  It may also be worthwhile to skim TR-3487 (attached).

Thanks!

Jeremy

rmharwood
13,722 Views

Hey, I agree with much of what you're thinking. On paper the license costs for some of the ONTAP features seem very high - others as well as snapvault. It can make it quite difficult to justify expenditure and ROI.

I raised the question last year to Netapp about the possibility of at least including a monthly snapshot schedule on the Snapvault secondary. I was told that this was a limitation in the core feature set of ONTAP, which could only support hourly, daily and weekly. I since set up a small script to take snapshots on various volumes on the first Sunday of each month. I actually take "snapvault snapshots" as opposed to regular ones. Not entirely sure what the difference is but I can do "snapvault snap create volname sv_monthly" because I have a "snapvault snap sched" of "create volname sv_monthly 6@-" to retain 6 snapshots prefixed "sv_monthly".

Personally I'm not a big fan of Protection Manager (DFPM) and before we had to have it (for OSSV management and for Snapdrive/Snapmanager tie-in) I could have happily done without it. I still have many Snapvault relationships that were set up and are maintained outside of DFPM. I'm a storage admin and more than comfortable with command line operations and scripting but I can see the benefit of these tools for admins without this level of confidence. I do like the way that Operations Manager reports on failed backups and Snapvault relationship lag times though.

The positives, for me, regarding Snapvault:

* I can create an incremental backup of a volume with 12 million files in 3 hours - it took over 4 days to go to tape

* I can create an incremental LUN backup of an NTFS filesystem containing 3 million files in just a few minutes - it took over 12 hours to go to tape

* I can use it in conjunction with OSSV (and Protection Manager) to backup hosts in a fraction of the time

* Unlike the standard snapshot feature I can schedule exactly when the updates take place and how many of each type to retain

* I can present the backup data back to the original host (or any other)

The incremental nature of transfers is the big win, obviously it's what snapmirror does also.

Not sure if any of that helps or makes sense!

Richard

johnlockie
13,723 Views

Thanks for the replies!

So basically, it appears I understand 10,000 foot view well enough.

I am seeing my issue is now more at the closer view.  I will read through the best practices guide more carefully.

I do understand, and see the benefit of using the qtree model.  I have it setup that way right now, and in fact had to move a couple of LUN's in to qtree's that were not in qtree's already! =O

What I am not seeing is the ability to snapvault a single snapshot from primary to secondary (which is really the original point of my post).  The way I see it operating in my environment (after setting it up) is that I am copying the data (I was saying volume earlier, but correct this is a qtree) from primary to secondary.  Then on the secondary I perform a snapshot.  Then to perform another snapshot I am copying the qtree (incremental this time), and then taking another snapshot.  THis method just doesn't seem to fit what I THOUGHT it did.

How it was sold to me was this (and maybe the reseller is not even clear....): You can take a snapshot and push it to a secondary storage device where it will reside and can be stored long term.

It was sold to me as something to solve the delta issues with snapshots that I am storing longer terms.

I am an Enterprise Vault customer too....and use Symantec's exchange archive.  They are selling us the other features like SQL and CIFS archiving - so I am comparing both cost and features of this against SnapVault.

rmharwood
13,723 Views

Gotcha.

Interested to hear about EV as it's something we're considering as an archive solution. Right now we don't have the space to Snapvault our 12TB or so CIFS files, especially with the one year retention that's required. I'm not certain that we could dispense with backups (either Snapvault or to tape) entirely. Would you consider EV to be a backup solution as well as for archiving?

We still rely upon offsite tape copies for DR purposes. I would imagine we'd still need to have up to date copies of our EV offsite also. Any ideas on how to feasibly achieve this?

Cheers,

Richard

johnlockie
13,723 Views

We use SME for "backup" and EV for "archive".  But because I am journaling mail I consider EV to be as good of a backup of SME is.  I can access to SQL database for EV in a matter of seconds, and we can pull mail in and out of archives really quickly.  It's basically the same underlying technology as SIS and VSS and snapshots, etc.  NetApp doesn't have the monopoly on snapshots or SIS....they just do it better than anyone else so far as I have seen, and their software is more powerful.

We snapmirror everything right now....but what I don't like is that I don't have long term backups of key data.  CIFS I could care less about....like you said.....I have a pretty decent amount of weekly snapshots I keep on that volume and then we snapmirror it, so I can generally go back up to 3 months just using traditional snapshots!  But beyond 3 months who cares?


It's email and accounting I am concerned about.  My company is in construction, and legally we are liable for work for 10 years (I think we can blame the state of California and our litiguous attitude for that)....so technically my direction on CIFS doesn't even cut it.  I need to keep everything for up to 10 years.  Nuts.....

So here I have this killer DR solution, no tapes, off site replication, multiple backups, etc. etc. (vmware, the usual bag of vendors, etc.).  But I have no way to grab a volume and "vault" it for 10 years "as is".  I was considering SnapVault, but now I am wondering if I should not just employ some type of disk to tape backup at the end of each year on top of snapvault.  And in this I have that frustration - we have spent hundreds of thousands on infrastructure to do away with tape - why would I bring it back even for a simple archive?  And who wants to tape archive multiples of TB's?  Not me =P.

So I am clawing for a solution.  Something that gives me peace of mind when I go to bed.  I see all these holes in our beutiful little architecture, and I am seeking to address those.  They are tiny mis-shapen holes LOL.

jmerrill
13,723 Views

John,

What I am not seeing is the ability to snapvault a single snapshot from primary to secondary (which is really the original point of my post).  The way I see it operating in my environment (after setting it up) is that I am copying the data (I was saying volume earlier, but correct this is a qtree) from primary to secondary.  Then on the secondary I perform a snapshot.  Then to perform another snapshot I am copying the qtree (incremental this time), and then taking another snapshot.  THis method just doesn't seem to fit what I THOUGHT it did.

This is exactly what SV enables you to do.

Hopefully this example will help.  I have 2 volumes, "sv_src" on the SV Source system and "sv_dest" on the SV destination.  On the source I also have a qtree named qt1 and is replicating to qt1_dest on the destination system.

After the initilize the snap list shows:

Source

vmcoe-fas3070-01*> snap list sv_src
Volume sv_src
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
31% (31%)    0% ( 0%)  Apr 02 13:19  vmcoe-fas2020-01(0135018611)_sv_dest_qt1_dest-src.0 (snapvault)


Destination

vmcoe-fas2020-01*> snap list sv_dest
Volume sv_dest
working....

  %/used       %/total  date          name
----------  ----------  ------------  --------
42% (42%)    0% ( 0%)  Apr 02 13:23  vmcoe-fas2020-01(0135018611)_sv_dest-base.0 (busy,snapvault)

At this time, I only have the "base" snapshots that are required by SV.  I created a schedule on the source and destination also - since this is a test and I am manually running updates, I used the "-" in the schedule telling SV to only worry about retention, and the schedule will be done outside of the ONTAP scheduler (this was also described above as a way to create snapshots with a script).

Source Schedule

vmcoe-fas3070-01*> snapvault snap sched sv_src
create sv_src sv_test 8@-

Destination Schedule

vmcoe-fas2020-01*> snapvault snap sched sv_dest
create sv_dest  0@-
create sv_dest sv_test 16@-


Note the other create schedule is created by SV, I only create the one associated with the snapshot sv_test.

So in this configuration, I am telling SnapVault to retain 8 snapshots named "sv_test" on the source and 16 of the same name on the destination.  (If we were using the internal scheduling, you would want to make sure you use the "-x" option in the snapvault snap sched command to tell SV that it needs to get the data from the Primary system.)

Now I've create 2 snapshots using the snapvault snap create command on the Primary system and I have yet to perform an update.

Source snap list

vmcoe-fas3070-01*> snap list sv_src
Volume sv_src
working....

  %/used       %/total  date          name
----------  ----------  ------------  --------
37% (37%)    0% ( 0%)  Apr 02 13:30  sv_test.0
59% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.1
62% (16%)    0% ( 0%)  Apr 02 13:19  vmcoe-fas2020-01(0135018611)_sv_dest_qt1_dest-src.0 (snapvault)

The destination will still look the same as above because I have yet to do a transfer.  Note that the "snapvault" soft lock still resides on the snapshot that was created for the baseline transfer.  Now, I can tell SV to update from either sv_test.0 or sv_test.1.  If we use sv_test.0, then SV will perfrom a comparison on the snapshots named vmcoe-fas2020-01(0135018611)_sv_dest_qt1_dest-src.0 and sv_test.0, meaning if there was a file that existed only in sv_test.1, it wouldn't get transferred.

After the update (and subsequent snapvault snap create), here's what we see on the source and destination

Source snap list

vmcoe-fas3070-01*> snap list sv_src
Volume sv_src
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
46% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.0      (snapvault)
63% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.1

Notice that the "snapvault" soft lock moved to sv_test.0 (and the base was also removed - this is because that was just a temporary snapshot required for SV to start).  If we were using Volume SnapMirror (VSM), we would have both the data for sv_test.0 and sv_test.1 on the destination.

Destination snap list

Volume sv_dest
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
40% (40%)    0% ( 0%)  Apr 02 13:39  sv_test.0
58% (42%)    0% ( 0%)  Apr 02 13:39  vmcoe-fas2020-01(0135018611)_sv_dest-base.1 (busy,snapvault)

Now on the destination we have sv_test.0 and the "base" SV snapshot (this snapshot will always reside on the SV destination - it's actually what gets updated by SV).

Now, we can create even more snapshots on the source with the sv_test name and you would see the following (remembering that the snapshot with the .0 is always the most recent)

Source snap list

vmcoe-fas3070-01*> snap list sv_src
Volume sv_src
working....

  %/used       %/total  date          name
----------  ----------  ------------  --------
38% (38%)    0% ( 0%)  Apr 02 13:39  sv_test.0
59% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.1
70% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.2
76% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.3
80% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.4
83% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.5      (snapvault)
85% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.6

Here you can see we've created 5 new snapshots, but none of them have been updated to the destination system yet (the snapshot with the "snapvault" soft lock was the snapshot last used for the SV update).  This means that data in those 5 snapshots only resided on the source system.  If I run an update, here's what you will see on the source and destination:

Source snap list

Volume sv_src
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
38% (38%)    0% ( 0%)  Apr 02 13:39  sv_test.0      (snapvault)
59% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.1
70% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.2
76% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.3
80% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.4
83% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.5
85% (46%)    0% ( 0%)  Apr 02 13:30  sv_test.6

Destination snap list

vmcoe-fas2020-01*> snap list sv_dest
Volume sv_dest
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
37% (37%)    0% ( 0%)  Apr 02 13:47  sv_test.0
57% (42%)    0% ( 0%)  Apr 02 13:47  vmcoe-fas2020-01(0135018611)_sv_dest-base.1 (busy,snapvault)
68% (46%)    0% ( 0%)  Apr 02 13:39  sv_test.1

So here we have 7 total snapshots created on the source, and only 2 on the destination.  This shows that you can pick the snapshot you would like to update from.  In these examples, I used the following snapvault update command

vmcoe-fas2020-01*> snapvault update -s sv_test.0 /vol/sv_dest/qt1_dest

I could have substitued any of the other snapshot names in the manual update, but typically you will want to update from the .0 snapshot (if you use the built in schedules, it will always use the most recent snapshot of the given name).

In addition, one of the other benefits of SV is the conecpt of snapshot coalescing.  This allows us to take 1 snapshot at the end of all SV trasnfers for that destination volume.  If you compare this to Qtree SnapMirror (QSM), QSM will take a snapshot after each qtree completes its transfer - so a destination volume with 10 qtrees will create 10 snapshots vs. a SV transfer identical to that will create 1 snapshot on the destination.

With regards to longer retention, this can be done - in the example above, I'm only retaining 8 snapshots named sv_test (let's say they are taken every hour), giving us 8 hours of local recovery points.  Now, assume we replicate this sv_test snapshot every 8 hours, I'm now keeping 16 copies of a snapshot named sv_test on the secondary system, and it's only replicated every 8 hours.  This gives me 16 copies of my data on the secondary, which are 8 hours apart.  The same concept can be applied to nighlty or even weekly snapshots where you may only want to keep 1-2 nightly snapshots locally, but 90 remotely.  The only restriction is that you can't have more than 251 snapshots of any given volume - there are ways to get around this using FlexClones though.

If there is a long term requirement, you could go down the FlexClone path if needed, but I would recommend examining the requirements to see what they are.  If it's just a montly snapshot for 10 years, it would need to be scripted, but you could keep that all on disk (10 years * 12 months = 120 snapshots).  The benefit here is that your data is still on disk, stored in a native format, and only the block changes are written to disk, while every snapshot looks like a full backup - this results in reduction of media (when compared to running say a full backup to tape every month).

I'm sorry for the long winded reply, but I hope this helps slightly and doesn't confuse you.

Thanks!

Jeremy

johnlockie
13,723 Views

All I can say to that is...holy crap!

100% thanks.  Don't apologize for being long winded.

It's a holiday weekend.  So give me a few days to chew through it.  I will try to update this thread next week once I dig in, but from the outset this information is the exact kind of stuff I needed to bridge from that 10k feet view to the 1 foot view.

My NetApp partner/var is coming in a week from today I think, they are supposed to confirm with me today.  I am one of those guys that is actually willing to pay an engineer to come help design and not just implement! =P

Their engineers are better than sales.....bridging this architecting gap is sometimes one of the most difficult parts of my job.  Especially because I am concerned with financial ramifications on top of technical ones.  Sales makes it sound too obvious...or simple.

John

jmerrill
11,680 Views

John,

Please be sure to read this at your leisure and let us know if you have any further questions.  Hopefully this gives you a decent overview and I (or your var/reseller) can answer any outstanding questions you may have.

Thanks!

Jeremy

adaikkap
11,679 Views

Hi Richard,

     Can you elaborate me on what prevents you from using Protection Manager.

Just trying to understand what is preventing your adoptions of the same.

regards

adai

rmharwood
11,679 Views

Nothing is really preventing me from using it. I set up a substantial amount of Snapvault relationships before we even had it, so I was comfortable with doing that on the command line. I was never really successful in pointing two primary qtrees into a single secondary volume in DFPM. It always seems to want me to create a separate volume on the secondary for each on the primary. That could just be my misunderstanding though. I also dislike the snapshot naming conventions that DFPM uses.

For OSSV management it works for us quite well.

Cheers,

Richard

adaikkap
11,680 Views

With Protection Manger 3.8 or latter we do allow fan-in of secondary volumes for SV and QSM relationships.

Where we allow qtrees form different primary be backed up to a single secondary volumes.

The options is call " dpMaxFanInRatio" which you will have to set it to more than 1 to enable fan-in.

Also starting 3.8 we do allow flexible naming conventions in the snapshot naming which is again controlled by the following options.

pmCustomNameUseHostName

pmCustomNameUseQtreeList

pmCustomNameUseRetentionType

pmCustomNameUseVolumeName

Regards

adai

rmharwood
11,680 Views

Thank you for this information. I will follow up and make some changes to our DFM environment.

Richard

johnlockie
11,679 Views

Just a quick update and summary/clarification from my original post.  I had an engineer here with our local NetApp VAR Friday and we went over it together.

1. I did not realize that snapvault spanshots are intended to replace the local snapshots on the primary device.  This means that I blow out the snap sched configs, and replace them with snapvault snap sched configs (on PRIMARY).  This eliminates the confusion I had earlier regarding local snapshots and snapvault snapshots.  They are one in the same now on the primary filer, and I can do away with my old snapshot schedules.

2. Now I see that no snapshots are actually taken on the secondary filer.  In actuality, they are just pulling the snapvault snapshots from primary to secondary, and then allowing for a seperate "unique" schedule in order to expand retention times to more reasonable distances for "archiving".  On the secondary filer, using the command "snapvault snap sched -x" is simply stating "copy from source any missing snapshots since last copy and then run retention rules to delete any snapshots that are not within set time frames to be kept".


The only serious issue I raise is the lack of support for monthly (and even yearly) schedules on the snapvault secondary, but I will look in to protection manager and scripting solutions.  I see now that the only thing that needs to happen is a manual snapvault snapshot be made on the primary, let it replicate and then manually delete it after retention period expires.

Otherwise, after reading the Best Practices Guide for OSSV I am pleasently surprised at how it works.  My original understanding was not entirely correct, as the way I thought NetApp was doing it was more crude than the way it is actually done 🙂

jmerrill
9,860 Views

John,

Sorry for the delay - glad you were able to get many of your questions answered!

I just wanted to add a little more detail/clarification.

1. I did not realize that snapvault spanshots are intended to replace the local snapshots on the primary device.  This means that I blow out the snap sched configs, and replace them with snapvault snap sched configs (on PRIMARY).  This eliminates the confusion I had earlier regarding local snapshots and snapvault snapshots.  They are one in the same now on the primary filer, and I can do away with my old snapshot schedules

In addiition, this also gives you the ability to name your snapshots with your own naming convention.  All the documentation refers to these normally as sv_hourly, sv_nightly, and sv_weekly, but that's mainly just for ease of use and relating the snapshot names back to the system created snapshots.

2. Now I see that no snapshots are actually taken on the secondary filer.  In actuality, they are just pulling the snapvault snapshots from primary to secondary, and then allowing for a seperate "unique" schedule in order to expand retention times to more reasonable distances for "archiving".  On the secondary filer, using the command "snapvault snap sched -x" is simply stating "copy from source any missing snapshots since last copy and then run retention rules to delete any snapshots that are not within set time frames to be kept".

Actually there's a little more to it.  SnapVault will just pull the data from the most recent snapshot (the one with .0 in the name) for the given name.  For example if you have the following snapshot names sv_hourly.0, sv_test, and sv_hourly.1, SnapVault will just run the diff on sv_hourly.0 and sv_hourly.1 (assuming .1 is the common snapshot).  Any data/files that only existed in sv_test will not be replicated.

I see now that the only thing that needs to happen is a manual snapvault snapshot be made on the primary, let it replicate and then manually delete it after retention period expires.

Althought true, you may not even need to create this snapshot on the Primary system - all the data is most likely on the Secondary system, so there's no need to hold up space and initiate a transfer (see the section about weekly snapshots on the secondary in TR-3487).

Thanks!

Jeremy

Public