Data Backup and Recovery
Data Backup and Recovery
Hi all,
I've installed the latest versions of SnapDrive and SnapManager onto a new Server 2008 R2 - Exchange 2010 server and completed the configuration wizard successfully, However whenever i try to take a backup of the server i get the following error:
Backup SG/DB [General] Error: SnapManager detected the following Excahnge writer error. Please wait until the system load subsides, then retry SnapManager operation.
VSS_E_WRITERERROR_TIMEOUT: The writer failed due to a timeout between freeze and thaw.
I have tried the backup at several different points during the day with the same effect, even out of hours. The server does'nt seem to be generating much I/O when i check? Has anyone else encountered this error?
The drives on the server are set up as RDM LUNS.
Best Regards,
Kirk
Hi Kirk,
Not helping in the discussion, I'm sure, but I have to get this off my chest. Don't get me started on VSS errors with SME. 😞 I have seen them all: timeouts, veto's, retryable's, undetermined's, etc. Sometimes I think it's a miracle when a backup does succeed!
Kind regards, Bjorn
MSDN recommends
The writer operation failed because of a time-out between the Freeze and Thaw events. The recommended way to handle this error code is to wait ten minutes and then repeat the operation, up to three times.
Google has no clear solution for this issue but other backup products with the same issue have been reported fixed by uninstalling backup app (SME) then reinstalling.
These links may also help.
http://support.microsoft.com/kb/975928
http://support.microsoft.com/kb/976329
http://support.microsoft.com/kb/975832
http://support.microsoft.com/kb/972135
http://support.microsoft.com:80/kb/970770
Recommend you get NetApp support on the phone however.
Hope it helps
Bren
Ok I have tried a few diferent things now, firstly the timeout error seems worrying as the system is under low load at present. I changed the backup job to only backup one storage group at a time. Each job is now successful. It looks like if you try to backup all the storage groups at once within the DAG it simply cant hold the I/O for long enough.
If someone knows of a regkey or something to increase the timeout event that would be a starting point.
So for now we have working backups but 4 separate jobs covering the 4 storage groups.
Has anyone come across anti-virus causing snapshot issues?
Good to hear! Coincidently, I tried the exact same thing and with promising results. Indeed it seems as if the system cannot hold the writes long enough to allow a VSS snapshot to be created when backing up a large amounts of databases/SG's at the same time. We are also working with 4 separate jobs now and so far it seems to be working properly.
Note that SME already splits up the job into subjobs, but the threshold seems to be at 30 databases. It might be a good idea if a future release of SME would set the threshold at, say, 10. Or perhaps someone knows how to set this treshold manually?
Bjorn
Antivirus can definitely cause IO contention issues. Try disabling that and see if results are different.
Are you still having this problem? I had it yesterday with one of the databases on my server. The other 15 could back up without problem, is was just the one databases which failed.
Confirmed lots of free space in the LUNs and flexvols for both the database and transaction logs.
As we have multiple copies I used the Exchange powershell command below to stop and start the database without taking exchange down.
suspend-mailboxdatabasecopy Servername\databasename
wait 2 minutes
create a snapdrive snapshot of the lun - Just needed to bring my snapvaults forward
resume-mailboxdatabasecopy Servername\databasename
The error is gone today and all the Exchange databases are backing up via SME without issue
Hope it helps
Bren