Data Protection

Getting random timeout on PRE_CLONE_CREATE_CMD01

cferbrache
11,175 Views

Occasionally my clone process will timeout with the below highlighted error. I am running SC 3.5. I have attached my config file. Any help would be appreciated.

[Thu Jan 31 18:30:01 2013] INFO: Logfile timestamp: 20130131183001 [Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.out.20130116183001.log

[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.debug.20130116183001.log

[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.stderr.20130116183001.log

########## Parsing Environment Parameters ##########

########## PRE APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:01 2013] INFO: No commands defined

########## PRE APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## Application quiesce ########## [Thu Jan 31 18:30:01 2013] [comet:9090(3.5.0.1)] INFO: Quiescing databases [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Informix database quiesced [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Quiescing database finished successfully

########## POST APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined

########## POST APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE NETAPP COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined

########## PRE NETAPP COMMANDS FINISHED SUCCESSFULLY ##########

########## Detecting Data OnTap mode for fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: Data OnTap 7 mode detected

########## Generating Info ASUP on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp ASUP create on fas6240d finished successfully

########## Gathering Information for luns on fas6240d ##########

########## Gathering Information for luns on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp Inventory of luns on fas6240d finished successfully

########## Gathering Information for fas6240d:d_pid_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pid_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pid_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pil_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pil_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pil_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pir_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pir_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pir_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pit_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pit_002 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pit_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pia_003 ########## [Thu Jan 31 18:30:04 2013] INFO: Performing NetApp Snapshot Inventory for d_pia_003 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pia_003 on fas6240d completed Successfully

########## Running NetApp Snapshot Rename on Primary fas6240d ########## [Thu Jan 31 18:30:04 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pid_002 [Thu Jan 31 18:30:08 2013] INFO: NetApp Snapshot Rename for d_pid_002 on fas6240d completed Successfully [Thu Jan 31 18:30:08 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pil_002 [Thu Jan 31 18:30:09 2013] INFO: NetApp Snapshot Rename for d_pil_002 on fas6240d completed Successfully [Thu Jan 31 18:30:09 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pir_002 [Thu Jan 31 18:30:10 2013] INFO: NetApp Snapshot Rename for d_pir_002 on fas6240d completed Successfully [Thu Jan 31 18:30:10 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pit_002 [Thu Jan 31 18:30:11 2013] INFO: NetApp Snapshot Rename for d_pit_002 on fas6240d completed Successfully [Thu Jan 31 18:30:11 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pia_003 [Thu Jan 31 18:30:12 2013] INFO: NetApp Snapshot Rename for d_pia_003 on fas6240d completed Successfully

########## Creating snapshot(s) ########## [Thu Jan 31 18:30:12 2013] INFO: NetApp Snap Creator Framework 3.5.0 detected that SnapDrive is not being used. File system consistency cannot be guaranteed for SAN/iSAN environments

########## Starting Consistency Group Snapshot for fas6240d ########## [Thu Jan 31 18:30:12 2013] INFO: Starting NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent for volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 on fas6240d [Thu Jan 31 18:30:12 2013] INFO: NetApp Consistency Group Start for fas6240d volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 succeeded!

########## Committing Consistency Group Snapshot for all filers ########## [Thu Jan 31 18:30:12 2013] INFO: Commiting NetApp Consistency Group Snapshot for fas6240d succeeded!

[Thu Jan 31 18:30:12 2013] INFO: Creation of NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent on fas6240d Completed Successfully

########## PRE APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:12 2013] INFO: No commands defined

########## PRE APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## Application unquiesce ########## [Thu Jan 31 18:30:12 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing databases [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Informix database unquiesced [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing database finished successfully

########## POST APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:13 2013] INFO: No commands defined

########## POST APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE CLONE CREATE ##########

########## Pre clone handling for plugin INFORMIX ##########

########## UMOUNT COMMANDS ##########

[Thu Jan 31 18:30:15 2013] INFO: No commands defined

########## UMOUNT COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE CLONE COMMANDS ########## [Thu Jan 31 18:30:15 2013] INFO: Running pre clone command PRE_CLONE_CREATE_CMD01 [/usr/local/bin/preclone_taubsco.sh]

[Thu Jan 31 18:40:15 2013] ERROR: [scf-00103] Running pre clone command [/usr/local/bin/preclone_taubsco.sh] failed with exit code [1] and message [500 read timeout at /</usr/local/scServer3.5.0/snapcreator>SnapCreator/Agent/Remote.pm line 474 ]

########## PRE CLONE COMMANDS FINISHED WITH FAILURES ##########

########## PRE EXIT COMMANDS ##########

[Thu Jan 31 18:40:15 2013] INFO: No commands defined

########## PRE EXIT COMMANDS FINISHED SUCCESSFULLY ##########

19 REPLIES 19

ktenzer
11,105 Views

It appears the script /usr/local/bin/preclone_taubsco.sh is running longer than 60 seconds. What is wierd is you have SC_AGENT_TIMEOUT=600 so it should wait longer.

I am really not sure on this one. I would try with SC 3.6p1 and see if that helps. Could be a bug somewhere that we arent waiting longer than 60 seconds for SC_CLONE_TARGET.

In addtion you could run agent in multithreaded mode on taurus, see if that helps:

./snapcreator --start-multithreaded-agent 9090

My thought is maybe there are other operations happening so it is timing out.

Regards,

Keith

sivar
11,103 Views

1) You have mentioned randomly failing, but I have to still ask...whether the scAgent/config/agent.conf file has permissions to run /usr/local/bin/preclone_taubsco.sh

2) Please try bumping up the agent timeout settings as below.

SC_AGENT_TIMEOUT=900

SC_AGENT_UNQUIESCE_TIMEOUT=60

cferbrache
11,103 Views

Yes, the /usr/local/bin/preclone_taubsco.sh has permissions because this clone process runs twice a day and will run for days before blowing up. I will try changing these setting and see if that helps.

sivar
11,103 Views

Great. Thank you.

cferbrache
11,103 Views

Well the timeout value changes didn't help. My 18:30 run failed with the same issue.

sivar
11,103 Views

Ok. Thanks for letting us know.

I will research this further with the team.

Will you be able to stop the agent on taurus and restart it in multithreaded mode? (as Keith suggested above).

If you have any logs from the Operating System during that 18:30 run time frame, please forward to me. (also any agent logs from snapcreator).

sivar
11,103 Views

I reviewed the source code Remote.pm and my suspicion is that the agent is not returning it's version properly.

It could be that agent was not running during that time of the clone.

Are you manually restarting the agent every time after this error occurred?  or does it automatically work in the next run?

Also, please check if you can put the IP instead of the hostname in the below line, and let it run for few days.

From

SC_CLONE_TARGET=taurus:9090

To

SC_CLONE_TARGET=x.x.x.x:9090

cferbrache
11,103 Views

I never restart the agent. I just rerun the clone process and it works. I have changed SC_CLONE_TARGET to the IP address.

sivar
11,104 Views

Thank you. Please keep monitoring this for a week, and let us know if this still fail.

I will follow up on this within my team as well.

Thanks,

Siva Ramanathan

cferbrache
11,238 Views

Well the clone process blew up again in the preclone processing. The agent is running in multithread mode.

root     23811     1  0 Feb04 ?        00:00:01 ./snapcreator --start-multithreaded-agent 9090

There are no logs in the agent logs directory. How do I turn those on?

sivar
11,238 Views

Sorry to hear that. Please use the command mentioned below to start the agent in debug mode.

snapcreator --start-multithreaded-agent 9090--verbose --debug

cferbrache
11,239 Views

Well the clone blew up again last night. I had debug turned on as you specified but I don't see any logs in the /usr/local/scAgent3.5.0/logs.

sivar
10,110 Views

As Keith mentioned below, the agent logs creation is available only in snapcreator 3.6.

Will you be able to upgrade to Snapcreator 3.6 (both server and agent?)

Procedure is same like below.(thought it is written for 3.6 to 3.6p1 on Windows)

Backup entire snapcreator 3.6.0 directory to some place on C:\ drive

Install snapcreator 3.6p1 --> this will remove the 3.6.0 binaries and files/folders

    Don't select start as service yet during the install

- Copy 'scServer3.6.0\gui\snapcreator' directory to bring over the schedules to 'scServer3.6p1\gui\snapcreator'

    (NOTE: ONLY the snapcreator directory not the entire gui directory)

- Copy 'scServer3.6.0\configs' to bring over the configs to 'scServer3.6p1\configs'

- Copy 'scServer3.6.0\logs' to bring over the logs to 'scServer3.6p1\logs'

C:\Program Files\NetApp\NetApp_Snap_Creator_Framework\scServer3.6p1\gui>java -jar snapcreator.jar -upgrade

Starting Derby Server

Got an embedded connection.

Testing embedded connection by executing a sample query

number of rows in sys.systables = 44

Updating table security_actions

Updating the scheduler triggers.

Shuting Down Derby Server

C:\Program Files\NetApp\NetApp_Snap_Creator_Framework\scServer3.6p1\gui>java -jar snapcreator.jar

Starting Derby Server

Got an embedded connection.

Testing embedded connection by executing a sample query

number of rows in sys.systables = 44

Starting Jetty Server

Jetty Server Started on port: 8080

ktenzer
11,238 Views

SC_AGENT_LOG=Y is the parameter but I believe this was only available with SC 3.5 and higher. Look in config if you see it, its there else not available in your version.

Keith

cferbrache
11,238 Views

I am on SC 3.5.0 but I don't see that in the config nor in the default.conf.

ktenzer
11,238 Views

Ok then that must have got added in SC 3.6 not 3.5

Your only option is to run agent with --verbose --debug and wait for error and then capture anything

Keith

cferbrache
10,109 Views

Could a 17 minute time difference between the SnapCreator server and the agent server cause this timeout error? I just determined that the time was not syncing on the agent server.

sivar
10,109 Views

I am not aware of any server/agent time difference checking mechanism.

It may help if you could sync the time on agent and server.

cferbrache
10,109 Views

After syncing the time between the SnapCreator server and the agent server I seldom have an issue with the timeout. They still happen occasionally but not like they were.

Public