Data Backup and Recovery

SnapCreator 3.4 Oracle Quiesce failure

jntballard
3,010 Views

I have several DBs configured from a centralized SnapCreator server connecting to remote DBs running 11G R2  that are not functioning without any issues.  However, one of the DBs (running 10.2.04) is failing immediately at the beginning of the quiesce process with the following error:

########## Application quiesce ##########

[Mon Jul 18 10:59:27 2011] ERROR: 500 read timeout at /</usr/local/netapp/NTAPscreator/scServer3.4.0/snapcreator>SnapCreator/Agent/Remote.pm line 418

[Mon Jul 18 10:59:27 2011] [bighorn:9090(3.4.0.1)] ERROR: [scf-00053] Application quiesce for plugin oracle failed with exit code 1, Exiting!

The error log reports the following:

Mon Jul 18 10:59:27 2011] [scf-00053] Application quiesce for plugin oracle failed with exit code 1, Exiting!

[Mon Jul 18 11:16:11 2011] [scf-00053] Application quiesce for plugin oracle failed with exit code 100, Exiting!

proman:/SC/logs/DCR_DB1 #

On the Oracle database server the port is listening, and when I test the agent connectivity (via the GUI), it is succesful:

      *.9090               *.*                0      0 49152      0 LISTEN

199.249.215.37.9090  199.249.215.166.45378  7544      0 49232      0 CLOSE_WAIT

I have tried both the multi-threaded agent and the single-thread agent, same results.

Any suggestions on how to further troubleshoot this issue?  Not much details, and nothing shows up in the DEBUG logs...

Thanks,

Jerry

2 REPLIES 2

ktenzer
3,010 Views

Hi Jerry,

You are getting a read timeout. This means the SC_AGENT_TIMEOUT has been reached on scAgent and scServer terminates conbection.

I would do following:

1. Increase SC_AGENT_TIMEOUT in this config

2. Run agent in debug mode:./snapcreator --start-agent 9090 --verbose --debug

Try and determine what is taking so long

scServer sends quiesce to agent, agent performs quiesce, when finished messages are logged. If scServer terminates scAgent will continue so just wait till it comes back and look at times to figure out why quiesce for this db is so slow. Make sure you have WATCHDOG parameter enabled, this ensures unquiesce happens on scAgent if scServer terminates due to timeout.

Let us know

Keith

jntballard
3,010 Views

Thanks Keith,

I noticed on the dabase server it would leave the "DCR_DB1_DCR_DB_ARCH_quiesce.lck" file behind, the job would fail with the following error:

########## Application quiesce ##########

[Tue Jul 19 02:20:31 2011] ERROR: 500 read timeout at /</usr/local/netapp/NTAPscreator/scServer3.4.0/snapcreator>SnapCreator/Agent/Remote.pm line 418

[Tue Jul 19 02:20:31 2011] [bighorn:9090(3.4.0.1)] ERROR: [scf-00053] Application quiesce for plugin oracle failed with exit code 1, Exiting!

I have now enabled the WATCHDOG paramater as you suggested and increased the SC_AGENT_TIMEOUT to 300 seconds.

Until I logged in and removed the "quiesce.lck" file, every job would fail of cousre indicating there was already a "quiesce" operation in progress. 

Since this job was only for ARCHIVE only, does the WATCHDOG remove the ".lck" files?  And, if the database is actually in hotbackup mode (i.e. quiesce'd), will the WATCHDOG try to ensure the DB is 'un-quiesce'd'?

Thanks -

Jerry

Public