Hello.
Background
A while back, when we first started using SMHV, we were stumped as to how we could backup the snapshots from the filer to tape for long-term storage. To solve that problem, we created a nice and effective solution (http://communities.netapp.com/message/22958#22958) which allowed us to backup to tape, as well as backup the individual VHD files without having to backup via NDMP and in the event where a restore would be required, we would have to first restore the entire LUN and only then would we need to restore the VHD from the LUN. We were doing this for over a year and were content with that solution. In the interim, I haven't heard any other methods used by anyone to make us think otherwise.
Problem A
Recently, our filer has started to experience peaks where the CPU is pegged at 100% (or close to it) for a while, causing our Hyper-V servers to temporary loose connectivity to the filer for a short period of time, which caused all the VMs running on those boxes to crash and restart. Apparently, this is related to the iSCSI Initiator not being able to maintain a connection with the filer while the filer is too busy to respond in due timing. This happens to our 2008 R2 servers. Our 2003 servers, which are running LOB apps, just blue screen, crash, and reboot, where the error shown in debug anlysis points to the iSCSI initiator. (I mention this as a preface although I wonder if anyone else has ever experienced this issue.)
We opened a case with NetApp, and after analyzing perfstat outputs, it was determined that the high CPU utilization is indeed caused by the way we backup our Hyper-V LUNs to tape, with the explanation given that the backup is causing high amounts of "sparse reads", which in turn causes the filer's CPU to burst, which in turn causes our system to crash. 😞
So now we are back at square one, looking for a way to backup SMHV snapshots to tape.
Possible Solution A
The following is my analysis of the options presently available. I would love to hear from other people what they are doing.
Backing up via NDMP (as opposed to backing up individual VM Disk files)
- Problem A:
- There is no "_recent" snapshot, so there is no way I can think of where you can build a static backup selection list (using whatever program you typically use) to specify the Volume/QTree/LUN path, since the path will always be _todaysdate_snapshot_servername_etc_backup (ie. dynamic).
- We did think of having a post script run which would rename the latest backup snapshot to _recent (or similar) and then we could solve this issue, but then we needed a way to rename it back to the previous name for snap rotation and policy processing. We could do this as a post-script job after the backup to tape has been completed, but the timing needs to work out according to schedule, and we need to make sure we store the previous snapshots names, etc. It sounds a little bit too complicated to coordinate.
- Problem B:
- Since you cannot specify the content to backup via NDMP on the file level (ie. a LUN file), we can only select items to backup granularly at the QTree or Volume level. For the SMHV snapshots we see (_backup snapshot) after the SMHV job has finished, there are two LUNs to backup, one with the "_exclude" extension and one without. We only need to backup the one without the _exclude, but since we can only backup/select the entire Volume/QTree, we have to backup both LUN files. This essentially requires us to backup twice the data, which wastes tapes, and creates an uncessary extended backup window.
Problem B
Up until recently we were backing up our the data from our production filer. We have recently purchased a second filer for our DR site and have started to replicate the data over to the remote location. Now we need to select the source of the data we want to backup. Well - that depends on how we do replication.
If we use QTree replication, then:
- There is no _backup snapshot (or really any similar snapshot) for that matter on the destination volume.
- I cannot use a lun clone to create a read-only copy of the volume (and LUN) since it is locked due to SnapMirror. I would need a FlexClone license which costs extra $.
- Ideally I want to backup a consistent snapshot, where I know that SMHV quiesced the writers on the VMs and applications. Backing up a snapmirror copy is just backing up my VMs as the blocks have arrived to the other file, regardless of whether or not the VM is totally up to date. From what I understand, in the worst case, booting up a VM from such a backup would provide a bootup error complaining that the OS wasn't shut down properly, but aside from that the files inside the VM should be ok.
If we use Volume level replication, then:
- The snapshots get replicated as is, including the OS level consistency, which SMHV backups give me
- I am not sure how to know when I can backup the replicated Snapshot, since I need to verify that the entire chunk of this snap has properly been replicated to the DR site in it's entirety.
- I am stuck with the same issue above about not having a "_recent" backup to use where I can specify which backups I always want to backup, as opposed to manually created the selections for each backup window.
As this post gets longer and longer - in general I was wondering, what do people do to solve these issues, and does anyone have any ideas or potentiall solutions?
Thanks,
Reuvy