I recently upgraded SnapDrive/SnapManager on a Windows Server 2008 R2 SP1 host to SnapDrive 6.5 and SnapManager 6.0.1.1431. At the time I upgraded, supposedly there were no other changes being made to the database server. The server is a VMware virtual machine on vSphere 5.5 and the NetApp solution is a 3220 HA pair running ONTAP 8.1.2 7-Mode. The server is running SQL Server 2008 R2 SP2. I believe this configuration is supported according to the interop matrix.
Prior to updating this software, the hourly full SMSQL backups we were taking would take 3-5 minutes to complete. Post upgrade, I'm seeing different behavior (most of the time, at least). The syntax of my job is as shown below; I had to change it from what it was prior to the upgrade to get it to correctly pick up both SQL instances. From reading the documentation, it seems that the "new-backup -srv 'SERVERNAME' -d 'SERVERNAME' etc,etc,etc" should back up all instances on the server. This was not the case post-upgrade, and only the default instance was getting backed up. So, I changed it to what is shown below.
NEW JOB SYNTAX:
new-backup –svr 'SERVERNAME' -d 'SERVERNAME', '0', 'SERVERNAME\NAMEDINSTANCE', '0' -RetainBackupDays 10 -RetainShareBackupDays 0 -cpylgbkshare NOTHING_TOSHARE -lb -bksif -RetainSnapofSnapInfoDays 10 -rudays 10 -updmir –mgmt standard
Old job syntax (pre software upgarde) for comparison - this job ran fine under the old software versions:
new-backup –svr 'SERVERNAME' -RetainBackupDays 10 -lb -bksif -RetainSnapofSnapInfoDays 10 -rudays 10 -updmir –mgmt standard
This new syntax seems to work and backs up the databases successfully. However, the time that the job takes to complete (backing up the same databases as with the old SnapDrive/SnapManager software versions) has increased significantly, but not every time it is run.
Now, when I run SMSQL backups called via a scheduled task, jobs that run between 7am and 1 am complete in 11-12 minutes. Jobs that run between 2 am and 6 am run in 2-3 minutes. Yes, certainly you'd be right to point out that when the job runs quickly, it is during off-peak hours. However, there's nothing I'm aware of that is going on between 10 pm and 1 am (just for a simple example) that should be hitting the server any harder than between 2 am and 6 am. And again, I never once saw this issue prior to upgrading the SnapDrive/SnapManager packages.
This excess time the jobs take seems to be due to SMSQL not retrieving the SQL Server database information successfully. When the jobs take longer to run, I see the event timeline display similar to the following:
- 10:00:01 am - Event 308 logged - "SnapManager for SQL Server per-server license is licensed on server SERVERNAME"
- 10:10:22 am (This is the next SMSQL event logged) - Event 368 logged - "SQL Server database information was retrieved successfully."
- From that point on things start moving in the exected time frame
When the jobs run as I would have initially expcected (from 2-6 am) I see that the database information is retrieved in a much more timely manner:
- 6:00:03 am - Event 308 logged - "SnapManager for SQL Server per-server license is licensed on server SERVERNAME"
- 6:00:21 am (This is the next SMSQL event logged) - Event 368 logged - "SQL Server database information was retrieved successfully."
- The job then completes within a couple of minutes.
I'm trying to figure out why I'm getting this 10 minute delay on the backups between 7 am and 1 am. Someone from our DBA group is also seeing that the query/(ies) coming from SMSQL on the delayed jobs seem to hit the server harder during this time frame, but they haven't been able to pinpoint anything other than to say the queries are taking a long time to complete.
Due to the nature/source of this issue, other SMSQL tasks are impacted as well (for example, database cloning [as one might expect] since we seem to be having issues retrieving the databases in a timely manner).
Does anyone have any ideas as how I might go about further troubleshooting this to get the time it takes to retrieve the SQL server database information back to a reasonable range so that I don't hit a 10 minute delay on any SMSQL task I run between the hours of 7 am and 1 am? Any suggestions would be greatly appreciated, thanks!