Data Backup and Recovery

SnapManager for Exchange - VSS Writer error

penningtonkr
7,503 Views

Hi all,

I've installed the latest versions of SnapDrive and SnapManager onto a new Server 2008 R2 - Exchange 2010 server and completed the configuration wizard successfully, However whenever i try to take a backup of the server i get the following error:

Backup SG/DB [General] Error: SnapManager detected the following Excahnge writer error. Please wait until the system load subsides, then retry SnapManager operation.

VSS_E_WRITERERROR_TIMEOUT: The writer failed due to a timeout between freeze and thaw.

I have tried the backup at several different points during the day with the same effect, even out of hours. The server does'nt seem to be generating much I/O when i check? Has anyone else encountered this error?

The drives on the server are set up as RDM LUNS.

Best Regards,

Kirk

6 REPLIES 6

bjornkoopmans
7,503 Views

Hi Kirk,

Not helping in the discussion, I'm sure, but I have to get this off my chest. Don't get me started on VSS errors with SME. 😞 I have seen them all: timeouts, veto's, retryable's, undetermined's, etc. Sometimes I think it's a miracle when a backup does succeed!

Kind regards, Bjorn

BrendonHiggins
7,503 Views

MSDN recommends

The writer operation failed because of a time-out between the Freeze and Thaw events. The recommended way         to handle this error code is to wait ten minutes and then repeat the operation, up to three times.

Google has no clear solution for this issue but other backup products with the same issue have been reported fixed by uninstalling backup app (SME) then reinstalling.

These links may also help.


http://support.microsoft.com/kb/975928
http://support.microsoft.com/kb/976329
http://support.microsoft.com/kb/975832
http://support.microsoft.com/kb/972135
http://support.microsoft.com:80/kb/970770

Recommend you get NetApp support on the phone however.

Hope it helps

Bren

penningtonkr
7,503 Views

Ok I have tried a few diferent things now, firstly the timeout error seems worrying as the system is under low load at present. I changed the backup job to only backup one storage group at a time. Each job is now successful. It looks like if you try to backup all the storage groups at once within the DAG it simply cant hold the I/O for long enough.

If someone knows of a regkey or something to increase the timeout event that would be a starting point.

So for now we have working backups but 4 separate jobs covering the 4 storage groups.

Has anyone come across anti-virus causing snapshot issues?

bjornkoopmans
7,503 Views

Good to hear! Coincidently, I tried the exact same thing and with promising results. Indeed it seems as if the system cannot hold the writes long enough to allow a VSS snapshot to be created when backing up a large amounts of databases/SG's at the same time. We are also working with 4 separate jobs now and so far it seems to be working properly.

Note that SME already splits up the job into subjobs, but the threshold seems to be at 30 databases. It might be a good idea if a future release of SME would set the threshold at, say, 10. Or perhaps someone knows how to set this treshold manually?

Bjorn

adamgross
7,503 Views

Antivirus can definitely cause IO contention issues.  Try disabling that and see if results are different.

BrendonHiggins
7,503 Views

Are you still having this problem?  I had it yesterday with one of the databases on my server.  The other 15 could back up without problem, is was just the one databases which failed.

Confirmed lots of free space in the LUNs and flexvols for both the database and transaction logs.

As we have multiple copies I used the Exchange powershell command below to stop and start the database without taking exchange down.

     suspend-mailboxdatabasecopy Servername\databasename

wait 2 minutes

create a snapdrive snapshot of the lun - Just needed to bring my snapvaults forward

     resume-mailboxdatabasecopy Servername\databasename

The error is gone today and all the Exchange databases are backing up via SME without issue

Hope it helps

Bren

Public