Data Protection
Data Protection
Occasionally my clone process will timeout with the below highlighted error. I am running SC 3.5. I have attached my config file. Any help would be appreciated.
[Thu Jan 31 18:30:01 2013] INFO: Logfile timestamp: 20130131183001 [Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.out.20130116183001.log
[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.debug.20130116183001.log
[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.stderr.20130116183001.log
########## Parsing Environment Parameters ##########
########## PRE APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:01 2013] INFO: No commands defined
########## PRE APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########
########## Application quiesce ########## [Thu Jan 31 18:30:01 2013] [comet:9090(3.5.0.1)] INFO: Quiescing databases [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Informix database quiesced [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Quiescing database finished successfully
########## POST APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined
########## POST APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########
########## PRE NETAPP COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined
########## PRE NETAPP COMMANDS FINISHED SUCCESSFULLY ##########
########## Detecting Data OnTap mode for fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: Data OnTap 7 mode detected
########## Generating Info ASUP on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp ASUP create on fas6240d finished successfully
########## Gathering Information for luns on fas6240d ##########
########## Gathering Information for luns on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp Inventory of luns on fas6240d finished successfully
########## Gathering Information for fas6240d:d_pid_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pid_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pid_002 on fas6240d completed Successfully
########## Gathering Information for fas6240d:d_pil_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pil_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pil_002 on fas6240d completed Successfully
########## Gathering Information for fas6240d:d_pir_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pir_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pir_002 on fas6240d completed Successfully
########## Gathering Information for fas6240d:d_pit_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pit_002 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pit_002 on fas6240d completed Successfully
########## Gathering Information for fas6240d:d_pia_003 ########## [Thu Jan 31 18:30:04 2013] INFO: Performing NetApp Snapshot Inventory for d_pia_003 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pia_003 on fas6240d completed Successfully
########## Running NetApp Snapshot Rename on Primary fas6240d ########## [Thu Jan 31 18:30:04 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pid_002 [Thu Jan 31 18:30:08 2013] INFO: NetApp Snapshot Rename for d_pid_002 on fas6240d completed Successfully [Thu Jan 31 18:30:08 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pil_002 [Thu Jan 31 18:30:09 2013] INFO: NetApp Snapshot Rename for d_pil_002 on fas6240d completed Successfully [Thu Jan 31 18:30:09 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pir_002 [Thu Jan 31 18:30:10 2013] INFO: NetApp Snapshot Rename for d_pir_002 on fas6240d completed Successfully [Thu Jan 31 18:30:10 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pit_002 [Thu Jan 31 18:30:11 2013] INFO: NetApp Snapshot Rename for d_pit_002 on fas6240d completed Successfully [Thu Jan 31 18:30:11 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pia_003 [Thu Jan 31 18:30:12 2013] INFO: NetApp Snapshot Rename for d_pia_003 on fas6240d completed Successfully
########## Creating snapshot(s) ########## [Thu Jan 31 18:30:12 2013] INFO: NetApp Snap Creator Framework 3.5.0 detected that SnapDrive is not being used. File system consistency cannot be guaranteed for SAN/iSAN environments
########## Starting Consistency Group Snapshot for fas6240d ########## [Thu Jan 31 18:30:12 2013] INFO: Starting NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent for volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 on fas6240d [Thu Jan 31 18:30:12 2013] INFO: NetApp Consistency Group Start for fas6240d volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 succeeded!
########## Committing Consistency Group Snapshot for all filers ########## [Thu Jan 31 18:30:12 2013] INFO: Commiting NetApp Consistency Group Snapshot for fas6240d succeeded!
[Thu Jan 31 18:30:12 2013] INFO: Creation of NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent on fas6240d Completed Successfully
########## PRE APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:12 2013] INFO: No commands defined
########## PRE APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########
########## Application unquiesce ########## [Thu Jan 31 18:30:12 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing databases [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Informix database unquiesced [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing database finished successfully
########## POST APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:13 2013] INFO: No commands defined
########## POST APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########
########## PRE CLONE CREATE ##########
########## Pre clone handling for plugin INFORMIX ##########
########## UMOUNT COMMANDS ##########
[Thu Jan 31 18:30:15 2013] INFO: No commands defined
########## UMOUNT COMMANDS FINISHED SUCCESSFULLY ##########
########## PRE CLONE COMMANDS ########## [Thu Jan 31 18:30:15 2013] INFO: Running pre clone command PRE_CLONE_CREATE_CMD01 [/usr/local/bin/preclone_taubsco.sh]
[Thu Jan 31 18:40:15 2013] ERROR: [scf-00103] Running pre clone command [/usr/local/bin/preclone_taubsco.sh] failed with exit code [1] and message [500 read timeout at /</usr/local/scServer3.5.0/snapcreator>SnapCreator/Agent/Remote.pm line 474 ]
########## PRE CLONE COMMANDS FINISHED WITH FAILURES ##########
########## PRE EXIT COMMANDS ##########
[Thu Jan 31 18:40:15 2013] INFO: No commands defined
########## PRE EXIT COMMANDS FINISHED SUCCESSFULLY ##########
It appears the script /usr/local/bin/preclone_taubsco.sh is running longer than 60 seconds. What is wierd is you have SC_AGENT_TIMEOUT=600 so it should wait longer.
I am really not sure on this one. I would try with SC 3.6p1 and see if that helps. Could be a bug somewhere that we arent waiting longer than 60 seconds for SC_CLONE_TARGET.
In addtion you could run agent in multithreaded mode on taurus, see if that helps:
./snapcreator --start-multithreaded-agent 9090
My thought is maybe there are other operations happening so it is timing out.
Regards,
Keith
1) You have mentioned randomly failing, but I have to still ask...whether the scAgent/config/agent.conf file has permissions to run /usr/local/bin/preclone_taubsco.sh
2) Please try bumping up the agent timeout settings as below.
SC_AGENT_TIMEOUT=900
SC_AGENT_UNQUIESCE_TIMEOUT=60
Yes, the /usr/local/bin/preclone_taubsco.sh has permissions because this clone process runs twice a day and will run for days before blowing up. I will try changing these setting and see if that helps.
Great. Thank you.
Well the timeout value changes didn't help. My 18:30 run failed with the same issue.
Ok. Thanks for letting us know.
I will research this further with the team.
Will you be able to stop the agent on taurus and restart it in multithreaded mode? (as Keith suggested above).
If you have any logs from the Operating System during that 18:30 run time frame, please forward to me. (also any agent logs from snapcreator).
I reviewed the source code Remote.pm and my suspicion is that the agent is not returning it's version properly.
It could be that agent was not running during that time of the clone.
Are you manually restarting the agent every time after this error occurred? or does it automatically work in the next run?
Also, please check if you can put the IP instead of the hostname in the below line, and let it run for few days.
From
SC_CLONE_TARGET=taurus:9090
To
SC_CLONE_TARGET=x.x.x.x:9090
I never restart the agent. I just rerun the clone process and it works. I have changed SC_CLONE_TARGET to the IP address.
Thank you. Please keep monitoring this for a week, and let us know if this still fail.
I will follow up on this within my team as well.
Thanks,
Siva Ramanathan
Well the clone process blew up again in the preclone processing. The agent is running in multithread mode.
root 23811 1 0 Feb04 ? 00:00:01 ./snapcreator --start-multithreaded-agent 9090
There are no logs in the agent logs directory. How do I turn those on?
Sorry to hear that. Please use the command mentioned below to start the agent in debug mode.
snapcreator --start-multithreaded-agent 9090--verbose --debug
Well the clone blew up again last night. I had debug turned on as you specified but I don't see any logs in the /usr/local/scAgent3.5.0/logs.
As Keith mentioned below, the agent logs creation is available only in snapcreator 3.6.
Will you be able to upgrade to Snapcreator 3.6 (both server and agent?)
Procedure is same like below.(thought it is written for 3.6 to 3.6p1 on Windows)
Backup entire snapcreator 3.6.0 directory to some place on C:\ drive
Install snapcreator 3.6p1 --> this will remove the 3.6.0 binaries and files/folders
Don't select start as service yet during the install
- Copy 'scServer3.6.0\gui\snapcreator' directory to bring over the schedules to 'scServer3.6p1\gui\snapcreator'
(NOTE: ONLY the snapcreator directory not the entire gui directory)
- Copy 'scServer3.6.0\configs' to bring over the configs to 'scServer3.6p1\configs'
- Copy 'scServer3.6.0\logs' to bring over the logs to 'scServer3.6p1\logs'
C:\Program Files\NetApp\NetApp_Snap_Creator_Framework\scServer3.6p1\gui>java -jar snapcreator.jar -upgrade
Starting Derby Server
Got an embedded connection.
Testing embedded connection by executing a sample query
number of rows in sys.systables = 44
Updating table security_actions
Updating the scheduler triggers.
Shuting Down Derby Server
C:\Program Files\NetApp\NetApp_Snap_Creator_Framework\scServer3.6p1\gui>java -jar snapcreator.jar
Starting Derby Server
Got an embedded connection.
Testing embedded connection by executing a sample query
number of rows in sys.systables = 44
Starting Jetty Server
Jetty Server Started on port: 8080
SC_AGENT_LOG=Y is the parameter but I believe this was only available with SC 3.5 and higher. Look in config if you see it, its there else not available in your version.
Keith
I am on SC 3.5.0 but I don't see that in the config nor in the default.conf.
Ok then that must have got added in SC 3.6 not 3.5
Your only option is to run agent with --verbose --debug and wait for error and then capture anything
Keith
Could a 17 minute time difference between the SnapCreator server and the agent server cause this timeout error? I just determined that the time was not syncing on the agent server.
I am not aware of any server/agent time difference checking mechanism.
It may help if you could sync the time on agent and server.
After syncing the time between the SnapCreator server and the agent server I seldom have an issue with the timeout. They still happen occasionally but not like they were.
