2011-11-03 05:28 AM
We have a NetApp FAS3240 in metrocluster configuration and an vSphere 4.1 virtualization infraestructure. Running on top we have a SAP system based on Microsoft SQL Server.
One of the buying decision factors was the ability to make database backups in minutes without users disruption through NetApp FAS3240 Snapshots and SnapDrive / SnapManager for SQL software. Other storage systems had the ability of making hardware snapshots, but only a few were able to make it consistently.
Well, after installing SnapDrive 6.3.1R1 and SnapManager 5.1P2 and configuring & launching a Full Backup, much to our disappointment we discover that an error is aborting the Full Backup.
[10:41:30.821] All VDI backup threads completed snapshot preparation and databases IO are frozen for snapshot.
[10:41:30.822] SnapManager are ready to create snapshot...
-- 12 minutes after --
[10:53:30.602] Wait for snapshot completion timeout
[10:53:30.605] Error Code: 0xc004084f
If we launch a similar Full Backup Task in our Testing SAP System (equal to Production SAP System but without user load) the task finishes successfully but the snapshot phase lasts for 3 minutes that we think is very long time for a simple snapshot (it should last seconds, shouldn't it?)
Our SAP system is installed over a (vSphere 4.1) virtualized MS Windows 2008 R2 Enterprise using MS SQL Server 2008 R2 Enterprise Edition. Database Size is about 1.3 TB divided into 8 datafiles located in different VMDKs, differente datastores, different LUNs and different volumes. Log file is also in a different volume.
SnapDrive is integrating with VSC 2.1.1 installed into a virtual machine with OnCommand Core 5.0 and OnCommand Host 1.1. This virtual machine is different from the VM where we have installed vCenter, and different from the virtual machine where we have installed SAP and therefore SnapDrive and SMSQL.
Please find attached both SMSQL Full Backup Logs, one for the SAP Production System (ended with snapshot timeout error) and one for the SAP Test System (ended successfully but with 2 minutes of SQL Server IO freezing)
Thank you very much in advanced.
2011-11-04 06:27 AM
A few minutes ago we have performed a very interesting test. We have stopped Production SAP System (only SAP instance, not SQL DB). This way the user load has gone near to 0 (except direct DB connections from external systems) and we have launched a new SnapManager for SQL Full Backup.
The Backup has finished successfully!
So we can state that when there is a high user load connected to SAP and therefore to SQL DB, SnapManager for SQL timeouts after 12 minutes trying to :
- Freezing SQL IO? I don't think so, log files say "databases IO are frozen for snapshot" before the 12 minutes period...
- Launching snapshots? It makes no sense to me. Snapshoting is a very quick task
2011-11-06 03:12 PM
It's a strange one as SnapManager products have a 10 second timeout for creating a snapshot once DBs are frozen, if the snapshot is not created in that time the DB is unfrozen and the backup job aborted.
I recommend opening a Tech Support case with NetApp GSC so this can be investigated by a subject matter expert.
2011-11-07 04:40 AM
Thank you for your response. I'm agree with you it's strange, although it's my very first experience with SnapDrive and SnapManager and therefore my opinion is only based in posts like yours and blogs that I've read.
I've already opened a Tech Support Case (2002637676) several days ago, but the support offered it's being deplorable. Technical assigned is asking questions and requesting log files already answered and attached in previous messages. Meanwhile time is passing and we don't have automated backups of our production database.
I'm really dissapointed with SnapManager and Tech Support, because as I said before one of the buying decision factors was the ability to make database backups in minutes without users disruption through NetApp FAS3240 Snapshots and SnapDrive / SnapManager for SQL software and it's not working...