Backing Up SMHV Snapshots to Tape

israelmmi · ‎2011-07-31

Hello.

Background

A while back, when we first started using SMHV, we were stumped as to how we could backup the snapshots from the filer to tape for long-term storage. To solve that problem, we created a nice and effective solution (http://communities.netapp.com/message/22958#22958) which allowed us to backup to tape, as well as backup the individual VHD files without having to backup via NDMP and in the event where a restore would be required, we would have to first restore the entire LUN and only then would we need to restore the VHD from the LUN. We were doing this for over a year and were content with that solution. In the interim, I haven't heard any other methods used by anyone to make us think otherwise.

Problem A

Recently, our filer has started to experience peaks where the CPU is pegged at 100% (or close to it) for a while, causing our Hyper-V servers to temporary loose connectivity to the filer for a short period of time, which caused all the VMs running on those boxes to crash and restart. Apparently, this is related to the iSCSI Initiator not being able to maintain a connection with the filer while the filer is too busy to respond in due timing. This happens to our 2008 R2 servers. Our 2003 servers, which are running LOB apps, just blue screen, crash, and reboot, where the error shown in debug anlysis points to the iSCSI initiator. (I mention this as a preface although I wonder if anyone else has ever experienced this issue.)

We opened a case with NetApp, and after analyzing perfstat outputs, it was determined that the high CPU utilization is indeed caused by the way we backup our Hyper-V LUNs to tape, with the explanation given that the backup is causing high amounts of "sparse reads", which in turn causes the filer's CPU to burst, which in turn causes our system to crash. 😞

So now we are back at square one, looking for a way to backup SMHV snapshots to tape.

Possible Solution A

The following is my analysis of the options presently available. I would love to hear from other people what they are doing.

Backing up via NDMP (as opposed to backing up individual VM Disk files)

Problem A:
- There is no "_recent" snapshot, so there is no way I can think of where you can build a static backup selection list (using whatever program you typically use) to specify the Volume/QTree/LUN path, since the path will always be _todaysdate_snapshot_servername_etc_backup (ie. dynamic).
- We did think of having a post script run which would rename the latest backup snapshot to _recent (or similar) and then we could solve this issue, but then we needed a way to rename it back to the previous name for snap rotation and policy processing. We could do this as a post-script job after the backup to tape has been completed, but the timing needs to work out according to schedule, and we need to make sure we store the previous snapshots names, etc. It sounds a little bit too complicated to coordinate.
Problem B:
- Since you cannot specify the content to backup via NDMP on the file level (ie. a LUN file), we can only select items to backup granularly at the QTree or Volume level. For the SMHV snapshots we see (_backup snapshot) after the SMHV job has finished, there are two LUNs to backup, one with the "_exclude" extension and one without. We only need to backup the one without the _exclude, but since we can only backup/select the entire Volume/QTree, we have to backup both LUN files. This essentially requires us to backup twice the data, which wastes tapes, and creates an uncessary extended backup window.

Problem B

Up until recently we were backing up our the data from our production filer. We have recently purchased a second filer for our DR site and have started to replicate the data over to the remote location. Now we need to select the source of the data we want to backup. Well - that depends on how we do replication.

If we use QTree replication, then:

There is no _backup snapshot (or really any similar snapshot) for that matter on the destination volume.
I cannot use a lun clone to create a read-only copy of the volume (and LUN) since it is locked due to SnapMirror. I would need a FlexClone license which costs extra $.
Ideally I want to backup a consistent snapshot, where I know that SMHV quiesced the writers on the VMs and applications. Backing up a snapmirror copy is just backing up my VMs as the blocks have arrived to the other file, regardless of whether or not the VM is totally up to date. From what I understand, in the worst case, booting up a VM from such a backup would provide a bootup error complaining that the OS wasn't shut down properly, but aside from that the files inside the VM should be ok.

If we use Volume level replication, then:

The snapshots get replicated as is, including the OS level consistency, which SMHV backups give me
I am not sure how to know when I can backup the replicated Snapshot, since I need to verify that the entire chunk of this snap has properly been replicated to the DR site in it's entirety.
I am stuck with the same issue above about not having a "_recent" backup to use where I can specify which backups I always want to backup, as opposed to manually created the selections for each backup window.

As this post gets longer and longer - in general I was wondering, what do people do to solve these issues, and does anyone have any ideas or potentiall solutions?

Thanks,
Reuvy

barve · ‎2011-07-31

Hi,

SMHV does not rename the snapshot with _recent suffix, but it does provide the name of the snapshot that was created as part of backup process in $VMSnapshot environment variable. This variable can be checked in the post backup script to get the name of snapshot. Can this be used to solve the problem of knowing which snapshot to transfer / backup?

The backup (the one with _backup suffix) snapshot has two LUNs (one with _exclude and one without). But only one LUN, the one without _exclude contains the auto recovery changes and this is actually a LUN clone. This snapshot only has the auto recovery data. The actual VM data is present in the original LUN in the first (backing) snapshot.

Thanks,

Anagha

israelmmi · ‎2011-08-03

Hi Anagha,

We know about the $VMSnapshot already. We use it presently in our old backup solution I wrote about in the post mentioned above. The problem is that when we mounted this snapshot and backed it up file by file, it was causing CPU spikes on our filer.

When we tried to backup via NDMP, we could only select the QTree Level, which forces us to backup both of the LUNs, the clone and the real LUN. Secondly, we didn't know of a simple way to dynamically change the selection list within our backup program to tell it which snapshot path to backup (assuming we knew how via the $VMSnapshot variable). That's two negatives.

Thirdly, with QTree SnapMirror we cannot backup this data on the destination filer since there is no snapshots of similar names, and therefore no consistent data with which to backup. And even if we used Volume SnapMirror to solve this issue, and that would give us similar snapshot names as the source, we are still presented by the first two problems above.

Make sense?

THanks,

Reuv

ajeffrey · ‎2011-08-26

It might be overkill for what you need but the Snap Creator framework will take care of problem #3 above. It's a very flexible tool that has options for pre and post snapshot operations, snapvault, renaming based on configurable inputs, etc. They don't spell out SMHV on the Snap Creator main page but that tool has been adding capabilities like wildfire and it might be worth a post to the Snap Creator community to see if it would do the job. From what I have seen it is so flexible that you can write a plugin to do just about anything you need if it doesn't already do it.

There is a white paper on advanced scenarios and Snap Creator here if you are interested...

http://media.netapp.com/documents/tr-3933.pdf

Oh and did I mention...it's free. 🙂