Data Protection
Data Protection
We've been encountering the following errors for one of our databases that utilizes SnapManager when a full backup is attempted. The errors (in reverse chronological order) are as follows:
Source: SnapManager for SQL Server
EventID: 311
SnapManager for SQL Server online snapshot based full database backup failed.
Error Code: 0xc0040836
The thread that prepares the backup timed out before preparing for snapshot creation.
Source: SnapManager for SQL Server
EventID: 364
VDI operation failed for database 'DatabaseName' of on SQL Server 'InstanceName'.
Error Code: 0xc00408d4
The VDI backup thread received a termination request as a result of one or more SQL database backups failing, or the SnapDrive failing to prepare for snapshot creation.
Source: MSSQL$PROD
EventID: 18210
BackupVirtualDeviceFile::PrepareToFreeze: failure on backup device 'DeviceName'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
This (Sharepoint) database is roughly 250GB in size split across 2 LUNs (a single 210GB Data file on the Data LUN and a single 40GB Log file on the Log LUN). There are other databases on the instance as well that all backup without issue. The largest of which is 330 GB (290GB Data file and 40GB Log file located on the same respective LUNs) so we're not seeing any correlation to size.
What it looks like is happening is a timeout (default of 1800 seconds) is reached for the PrepareToFreeze command resulting in a backup failure. I've found that the error codes above do not map to any MS specific error codes, so I have to assume this is an issue specific to the NetApp SnapManager tool (we're running 5.1) instead.
Additionally, this database was being successfully backed up until just a few days ago, so it does not look to be related to a SnapManager configuration issue (per http://communities.netapp.com/thread/1784), nor can I find any evidence it's related to a problem with the SQL index services on the database. Also, no new databases were added so we're not running into a max number of DBs per LUN (as per http://communities.netapp.com/thread/4764) either.
The database engine is a clustered SQL Server 2008 R2, Enterprise Instance. The OS is Windows Server 2008 R2 Enterprise, 72GB RAM, 64bit.
Any help would be appreciated.
Thanks,
John Eisbrener
Solved! See The Solution
For those of you stumbling across this, the error ended up being generated as a result of the SnapBackup and a Sharepoint farm backup operation running at the same time. This is only conjecture, but we believe that the VDI PrepareToFreeze call was waiting for the Sharepoint farm backup operation to finish before it would return a success and proceed with the VSS backup. However the alloted (default) 30 minute timeout of this call was reached which triggered the errors reported in my intial post. Once we tracked down these conflicting backup operations, we rescheduled things to no longer interfere and have yet to experience this issue again.
Hope this helps someone else down the road. Thanks,
John Eisbrener
For those of you stumbling across this, the error ended up being generated as a result of the SnapBackup and a Sharepoint farm backup operation running at the same time. This is only conjecture, but we believe that the VDI PrepareToFreeze call was waiting for the Sharepoint farm backup operation to finish before it would return a success and proceed with the VSS backup. However the alloted (default) 30 minute timeout of this call was reached which triggered the errors reported in my intial post. Once we tracked down these conflicting backup operations, we rescheduled things to no longer interfere and have yet to experience this issue again.
Hope this helps someone else down the road. Thanks,
John Eisbrener