Data Protection
Data Protection
I have the plan to optimize our SQL Server 2008 High Availability with Snapmanager for SQL. The idea is that the SQL Server runs as a VM on a VMDK over NAS(OS, SQL Server, System Databases) and the user databases are located on LUNs. So we would take a snapshot with Snapmanager for VMware to recover the System(OS, SQL, System DBs) and snapmirror this to the other Volume on DR Site. Also we would take SQL Snapshots with Snapmanager for SQL and mirror(snapmirror) the LUNs to the DR Site. The intention is that we don't need two SQL Servers (with all jobs etc.). Currently we are using a SQL Mirror solution.
I have attached a file that describes my idea more precisely. Is this a useful concept?
The problem is SMSQL and SMVI interfere each other.
The following error occurs in windows event log when i take a snapshot with SMVI after i took a snapshot with SMSQL:
The default transaction resource manager on volume \\?\Volume{df094da9-87d0-11e2-95dd-005056b26dac} encountered a non-retryable error and could not start. The data contains the error code.
I have been working on a similar design for my company. I have found work arounds for most, but not all, of the issues. In my environment, I could not take a snapshot from SMVI and SMSQL at the same time either. For us, the SMVI would work, but then SMSQL would see another type of snapshot and fail. To get around that, I had to script a "consolidate snapshots" command on the SMVI side for the OS drive.
Where I am stuck is actually using the snapmirrored volumes. I can mount the mirrored volumes at the hot DR site, but the more heavily used DB's show as "SUSPECT". I have created a script to use in SMVI instead of the prebuilt GUI that gracefully stops the SQL services, suspends the VM, then does the OS volume snap, then restarts it all. Still have the SUSPECT DB's.
Maybe someone smarter than I can suggest a workable solution.
Ok, since no one else has responded here, I will make an attempt. First off, if you haven't already, I would suggest you both take a look at TR-4003. It explains the basic overview of our support for SQL in ESX. I would also suggest that you do not put your user databases on the same NetApp Volume as your OS, SQL Binaries, and system databases. I imagine this is why SMVI and SMSQL are "fighting" over snapshots. It is also important to note that VM Snapshots are typically not application (SQL) consistent which may explain why you are getting suspect databases when you attempt to restore. If you restore from a snapshot created by SMSQL, your databases will have been put into a consistent state and your .mdf and .ldf files verified good. When you mount the mirrored volume, are you mounting the snapmirror snapshot, or the snapshot created by SMSQL? Is your mirror being initiated by SMSQL or some other process?
Here is our setup, all on vmdk's:
C: has the OS, SQL install and system DB's
E: has the Database files
L: and S: are on the same volume and has the Log files and SnapInfo files respectively.
The C: was being snapped by SMVI with the other drives removed from the job. The other drives were snapped by SMSQL. Our over simplified intent was to have the staff at the remote site simply be able to break the mirror, repoint the drive mappings in vCenter and fire up the VM. Most of our expertise is at the main data center and should a disaster occur while we are in the office, we wanted it to be a simple as possible for those in our DR site.
We will need to rework our vCenter implementation as it's database is on one of the SQL servers we are trying to recover. If we can't bring vCenter up in the DR site, we can't have SMSQL restore from the snapshot...or am I incorrect that it is a dependancy? That begs another question though, if we go to a SQL Express on the vCenter server, will that be snapped consistently so we can mirror that VM to the DR site?
In reading through the article you referenced, it seems that our simplified process is unattainable due to the cross over between the two SnapManager products. Maybe a feature request could come from this forum post to have a version of SMVI that is SMSQL aware to acheive the higher level / simplification of DR recovery.
Thanks for your help welch!
Mike
Mike,
I've asked around and it sounds like what you are trying to do can be accomplished with VMWare's SRM. Check TR-3671 out and see if it gets you anywhere. I believe you want to have vCenter up and running in your DR site as well as your primary site, and then fail from one to the other. This should preculde the necessity of replicating the vCenter databases.
-- Justin