Data Backup and Recovery

SMSQL plugin backups works once-off before timeout and agent unreachable

mmodi
4,380 Views

We receive the error even if the agent timeout settings > SMSQL job completion time :

============ Timeout (Quiesce = 600, UnQuiesce = 601)

SC_AGENT_TIMEOUT=600

SC_AGENT_UNQUIESCE_TIMEOUT=601

DEBUG: Executing command [%SystemRoot%\Sysnative\WindowsPowerShell\v1.0\powershell.exe -psconsolefile "C:\Program Files\NetApp\SnapManager for SQL Server\SmsqlShell.psc1" -command "new-backup -Server 'SQLHOST\PRDINST1' -RetainBackups 7 -LogBkup -BkupSIF -RetainSnapofSnapInfo 8 -TruncateSqlLog $true -ManagementGroup Daily -ArchiveBackup -ArchivedBackupRetention Daily"]

============ SC Server Log File started: 16:05

E:\NetApp_Snap_Creator_Framework\scServer3.5.0\logs\SQLHOST\new_sql.out.20120403160555

[Tue Apr  3 16:05:55 2012] INFO: Logfile timestamp: 20120403160555

============ Agent lock file created 16:05

C:\Program Files (x86)\Netapp\NetApp_Snap_Creator_Framework\scAgent3.5.0\SQLHOST_new_sql_quiesce.lck

============ SMSQL Job Started 16:07

\\127.0.0.1\SMSQLReportFolder\Backup [SQLHOST]\04-03-2012_16.07.22

[16:07:22.521]  Backup Time Stamp: 04-03-2012_16.07.22

============ SMSQL Job Ended 16:13

[16:13:04.823]  *** SNAPMANAGER BACKUP JOB ENDED AT: [04-03-2012 16.13.04]

============ Agent lock file deleted ~ 16:15 (10 Minutes later would be due to 601 seconds i.e. UnQuiesce timeout value based on the debug logs below)

============ SC Debug Logs error :

[Tue Apr  3 16:05:55 2012] INFO: Starting watchdog with [-2596], forced unquiesce timeout [601] second(s)

[Tue Apr  3 16:15:56 2012] INFO: Skipping unquiesce, nothing needed for SMSQL integration

============ SC Out Logs error :

########## Application quiesce ##########

[Tue Apr  3 16:15:55 2012] ERROR: 500 read timeout at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/Agent/Remote.pm line 474

[Tue Apr  3 16:15:55 2012] [10.128.115.125:9090(3.5.0.1)] ERROR: [scf-00053] Application quiesce for plugin smsql failed with exit code 1, Exiting!

########## Application unquiesce ##########

[Tue Apr  3 16:25:55 2012] ERROR: No valid response

[Tue Apr  3 16:25:55 2012] ERROR: [scf-00054] Application unquiesce for plugin smsql failed with exit code 1, Exiting!

====================================================================================

Based on Admin Guide resolution for Error code : scf-00053

•    Application quiesce failed due to application error. Check logs and application settings. To ignore application errors and proceed with backup you can set “APP_IGNORE_ERROR=Y”

########## Application quiesce ##########

[Tue Apr  3 16:42:04 2012] ERROR: 500 read timeout at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/Agent/Remote.pm line 474

[Tue Apr  3 16:42:07 2012] INFO: Creating OM Event (script:critical-event) on 10.128.115.121

[Tue Apr  3 16:42:07 2012] INFO: OM Event (script:critical-event) on 10.128.115.121 created successfully

########## Application unquiesce ##########

[Tue Apr  3 16:52:07 2012] ERROR: No valid response

[Tue Apr  3 16:52:07 2012] INFO: Creating OM Event (script:critical-event) on 10.128.115.121

[Tue Apr  3 16:52:07 2012] INFO: OM Event (script:critical-event) on 10.128.115.121 created successfully

########## Running NetApp Snapshot Delete on Primary emlabprdfil1 ########## why is SC doing this ? (requires type daily but have removed retention, pending test)

[Tue Apr  3 16:52:22 2012] WARN: More than 7 NetApp snapshots exist, older snapshots of emlabprdfil1:site_a_sql_prdinst1_data will be automatically deleted!

########## NetApp Snap Creator Framework 3.5.0 completed with errors ##########

•    Further, changed “SC_AGENT_UNQUIESCE_TIMEOUT=” to empty in addition to “APP_IGNORE_ERROR=Y” since it was causing the same error to repeat

[Tue Apr  3 16:55:45 2012] INFO: Starting watchdog with [-1688], forced unquiesce timeout [605] second(s)

Same Error repeats after the default 605 seconds 

====================================================================================

Option 1  (using smsql plugin):

########## Application quiesce ##########

[Tue Apr  3 16:42:04 2012] ERROR: 500 read timeout at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/Agent/Remote.pm line 474

Option 2  (using smsql CLI as a snap create command):

########## SNAPSHOT CREATE COMMANDS ##########

[Thu Apr 12 09:02:27 2012] ERROR: [scf-00103] Running snapshot create command

[C:\windows\system32\WindowsPowerShell\v1.0\powershell.exe -psconsolefile "C:\Program Files\NetApp\SnapManager for SQL Server\smsqlShell.psc1" -Command new-backup -Server 'SQLHOST\PRDINST1' -RetainBackups 7 -LogBkup -BkupSIF -RetainSnapofSnapInfo 8 -TruncateSqlLog $true -ManagementGroup Daily -ArchiveBackup -ArchivedBackupRetention Daily]

failed with exit code [1] and message [500 read timeout at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/Agent/Remote.pm line 474]


1 ACCEPTED SOLUTION

ktenzer
4,380 Views

Ok a couple of problems

1) You have TRANSPORT=HTTP and OM_PORT=8488. Please change OM_PORT=8088 you can tell SC to use HTTP and then specify an https port. If you want to use HTTPS change TRANSPORT=HTTPS and PORT=443

2) Dont run scServer or scAgent as a service. So if you just have SC Server test CLI and if you have scAgent then run agent manually (stop service and run agent from CLI "snapcreator.exe --start-agent"). I think this is your issue.

3) Ensure communications works between scServer and scAgent "telnet 10.128.115.125 9090", if you get connection things are working.

Regards,

Keith

View solution in original post

7 REPLIES 7

ktenzer
4,380 Views

Hi There,

Yes Snap Creator supports not only many applications but also running SMs and the SC_AGENT_TIMEOUT is a very important setting. It tells the Snap Creator server how long to wait before breaking connection and throwing an error. The Snap Creator agent however will continue to proceed with its task unles SC_AGENT_WATCHDOG=Y, if this is the case then when the SC_AGENT_UNQUIESCE_TIMEOUT expires the agent will perform an unquiesce. It is important to understand these mechanics.

When SC runs a SnapManager the unquiesce is NOT important because we simply run SM and wait for it to complete. Therefore SC_AGENT_WATCHDOG=N should be set and thus you also dont need to set the SC_AGENT_UNQUIESCE_TIMEOUT. For applications where SC has direct integration these parameters are important but not for SnapManager integration.

In the next release of SC 3.6 we added plugin validation, so we can now set required settings for plugins and validate that things are setup to meet plugin requirements.

As for the SC_AGENT_TIMEOUT. If it takes 10 minutes for SM job to finish I would set it to 15 mins SC_AGENT_TIMEOUT=900, I would definitely pad the number by 5 minutes as SnapManager can take a while.

Hope this helps

Keith

mmodi
4,380 Views

Hi Keith,

Thanks for your response, I have tried increasing the timeout value and disabling the agent watchdog but still observe the same behavior.

Can you please have a brief look at the attached config and advise If am missing out on something here because the first job completes successfully before the agent hangs and requires a restart to continue with the next job.

########## Application quiesce ##########

[Mon Apr 16 10:01:36 2012] ERROR: No valid response

[Mon Apr 16 10:01:36 2012] ERROR: [scf-00053] Application quiesce for plugin smsql failed with exit code 1, Exiting!

########## Application unquiesce ##########

[Mon Apr 16 10:18:16 2012] ERROR: No valid response

[Mon Apr 16 10:18:16 2012] ERROR: [scf-00054] Application unquiesce for plugin smsql failed with exit code 1, Exiting!

########## PRE EXIT COMMANDS ##########

[Mon Apr 16 10:18:16 2012] INFO: No commands defined

########## PRE EXIT COMMANDS FINISHED SUCCESSFULLY ##########

[Mon Apr 16 10:18:16 2012] ERROR: validation of target 10.128.115.121 failed [No response received at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/ZAPIExecutor/DfmZAPIExecutor.pm line 65.]

[Mon Apr 16 10:18:16 2012] ERROR: No response received at /<E:\NetApp_Snap_Creator_Framework\scServer3.5.0\snapcreator.exe>SnapCreator/ZAPIExecutor/DfmZAPIExecutor.pm line 65.

Thanks

Modi

ktenzer
4,381 Views

Ok a couple of problems

1) You have TRANSPORT=HTTP and OM_PORT=8488. Please change OM_PORT=8088 you can tell SC to use HTTP and then specify an https port. If you want to use HTTPS change TRANSPORT=HTTPS and PORT=443

2) Dont run scServer or scAgent as a service. So if you just have SC Server test CLI and if you have scAgent then run agent manually (stop service and run agent from CLI "snapcreator.exe --start-agent"). I think this is your issue.

3) Ensure communications works between scServer and scAgent "telnet 10.128.115.125 9090", if you get connection things are working.

Regards,

Keith

mmodi
4,380 Views

Hi Keith,

Is point 2 addressed in SC 3.6 ?

I shall test that and get back to you if it works or is the only current workaround....

Thanks

Modi

ktenzer
4,380 Views

Yes point 2 should be addressed in 3.6, it was a bug in 3.5 where if agent or server is running as service the backup of SMSQL could hang.

Regards,

Keith

mmodi
4,380 Views

Thanks Keith, you were spot on in that the issue is resolved by running the agent manually (stopped service and run agent from CLI "snapcreator.exe --start-agent").

Cheers,

Modi

ktenzer
4,380 Views

Thanks for getting back to me and glad the workaround worked. Again this is a bug we should have fixed in SC 3.6. The community version will release in next week so check at www.snapcreator.com -> Downloads -> Community and you can test it out.

Regards,

Keith

Public