Community

Subscribe
Highlighted

Getting random timeout on PRE_CLONE_CREATE_CMD01

Occasionally my clone process will timeout with the below highlighted error. I am running SC 3.5. I have attached my config file. Any help would be appreciated.

[Thu Jan 31 18:30:01 2013] INFO: Logfile timestamp: 20130131183001 [Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.out.20130116183001.log

[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.debug.20130116183001.log

[Thu Jan 31 18:30:01 2013] INFO: Removing log combsco_cl_taurus.stderr.20130116183001.log

########## Parsing Environment Parameters ##########

########## PRE APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:01 2013] INFO: No commands defined

########## PRE APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## Application quiesce ########## [Thu Jan 31 18:30:01 2013] [comet:9090(3.5.0.1)] INFO: Quiescing databases [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Informix database quiesced [Thu Jan 31 18:30:03 2013] [comet:9090(3.5.0.1)] INFO: Quiescing database finished successfully

########## POST APPLICATION QUIESCE COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined

########## POST APPLICATION QUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE NETAPP COMMANDS ########## [Thu Jan 31 18:30:03 2013] INFO: No commands defined

########## PRE NETAPP COMMANDS FINISHED SUCCESSFULLY ##########

########## Detecting Data OnTap mode for fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: Data OnTap 7 mode detected

########## Generating Info ASUP on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp ASUP create on fas6240d finished successfully

########## Gathering Information for luns on fas6240d ##########

########## Gathering Information for luns on fas6240d ########## [Thu Jan 31 18:30:03 2013] INFO: NetApp Inventory of luns on fas6240d finished successfully

########## Gathering Information for fas6240d:d_pid_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pid_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pid_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pil_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pil_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pil_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pir_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pir_002 on fas6240d [Thu Jan 31 18:30:03 2013] INFO: NetApp Snapshot Inventory of d_pir_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pit_002 ########## [Thu Jan 31 18:30:03 2013] INFO: Performing NetApp Snapshot Inventory for d_pit_002 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pit_002 on fas6240d completed Successfully

########## Gathering Information for fas6240d:d_pia_003 ########## [Thu Jan 31 18:30:04 2013] INFO: Performing NetApp Snapshot Inventory for d_pia_003 on fas6240d [Thu Jan 31 18:30:04 2013] INFO: NetApp Snapshot Inventory of d_pia_003 on fas6240d completed Successfully

########## Running NetApp Snapshot Rename on Primary fas6240d ########## [Thu Jan 31 18:30:04 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pid_002 [Thu Jan 31 18:30:08 2013] INFO: NetApp Snapshot Rename for d_pid_002 on fas6240d completed Successfully [Thu Jan 31 18:30:08 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pil_002 [Thu Jan 31 18:30:09 2013] INFO: NetApp Snapshot Rename for d_pil_002 on fas6240d completed Successfully [Thu Jan 31 18:30:09 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pir_002 [Thu Jan 31 18:30:10 2013] INFO: NetApp Snapshot Rename for d_pir_002 on fas6240d completed Successfully [Thu Jan 31 18:30:10 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pit_002 [Thu Jan 31 18:30:11 2013] INFO: NetApp Snapshot Rename for d_pit_002 on fas6240d completed Successfully [Thu Jan 31 18:30:11 2013] INFO: Rename the recent snapshot from combsco_cl_tau-daily_recent to combsco_cl_tau-daily_20130131033015 for fas6240d:d_pia_003 [Thu Jan 31 18:30:12 2013] INFO: NetApp Snapshot Rename for d_pia_003 on fas6240d completed Successfully

########## Creating snapshot(s) ########## [Thu Jan 31 18:30:12 2013] INFO: NetApp Snap Creator Framework 3.5.0 detected that SnapDrive is not being used. File system consistency cannot be guaranteed for SAN/iSAN environments

########## Starting Consistency Group Snapshot for fas6240d ########## [Thu Jan 31 18:30:12 2013] INFO: Starting NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent for volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 on fas6240d [Thu Jan 31 18:30:12 2013] INFO: NetApp Consistency Group Start for fas6240d volumes d_pid_002,d_pil_002,d_pir_002,d_pit_002,d_pia_003 succeeded!

########## Committing Consistency Group Snapshot for all filers ########## [Thu Jan 31 18:30:12 2013] INFO: Commiting NetApp Consistency Group Snapshot for fas6240d succeeded!

[Thu Jan 31 18:30:12 2013] INFO: Creation of NetApp Consistency Group Snapshot combsco_cl_tau-daily_recent on fas6240d Completed Successfully

########## PRE APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:12 2013] INFO: No commands defined

########## PRE APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## Application unquiesce ########## [Thu Jan 31 18:30:12 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing databases [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Informix database unquiesced [Thu Jan 31 18:30:13 2013] [comet:9090(3.5.0.1)] INFO: Unquiescing database finished successfully

########## POST APPLICATION UNQUIESCE COMMANDS ########## [Thu Jan 31 18:30:13 2013] INFO: No commands defined

########## POST APPLICATION UNQUIESCE COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE CLONE CREATE ##########

########## Pre clone handling for plugin INFORMIX ##########

########## UMOUNT COMMANDS ##########

[Thu Jan 31 18:30:15 2013] INFO: No commands defined

########## UMOUNT COMMANDS FINISHED SUCCESSFULLY ##########

########## PRE CLONE COMMANDS ########## [Thu Jan 31 18:30:15 2013] INFO: Running pre clone command PRE_CLONE_CREATE_CMD01 [/usr/local/bin/preclone_taubsco.sh]

[Thu Jan 31 18:40:15 2013] ERROR: [scf-00103] Running pre clone command [/usr/local/bin/preclone_taubsco.sh] failed with exit code [1] and message [500 read timeout at /</usr/local/scServer3.5.0/snapcreator>SnapCreator/Agent/Remote.pm line 474 ]

########## PRE CLONE COMMANDS FINISHED WITH FAILURES ##########

########## PRE EXIT COMMANDS ##########

[Thu Jan 31 18:40:15 2013] INFO: No commands defined

########## PRE EXIT COMMANDS FINISHED SUCCESSFULLY ##########

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

It appears the script /usr/local/bin/preclone_taubsco.sh is running longer than 60 seconds. What is wierd is you have SC_AGENT_TIMEOUT=600 so it should wait longer.

I am really not sure on this one. I would try with SC 3.6p1 and see if that helps. Could be a bug somewhere that we arent waiting longer than 60 seconds for SC_CLONE_TARGET.

In addtion you could run agent in multithreaded mode on taurus, see if that helps:

./snapcreator --start-multithreaded-agent 9090

My thought is maybe there are other operations happening so it is timing out.

Regards,

Keith

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

1) You have mentioned randomly failing, but I have to still ask...whether the scAgent/config/agent.conf file has permissions to run /usr/local/bin/preclone_taubsco.sh

2) Please try bumping up the agent timeout settings as below.

SC_AGENT_TIMEOUT=900

SC_AGENT_UNQUIESCE_TIMEOUT=60

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

Yes, the /usr/local/bin/preclone_taubsco.sh has permissions because this clone process runs twice a day and will run for days before blowing up. I will try changing these setting and see if that helps.

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

Great. Thank you.

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

Well the timeout value changes didn't help. My 18:30 run failed with the same issue.

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

Ok. Thanks for letting us know.

I will research this further with the team.

Will you be able to stop the agent on taurus and restart it in multithreaded mode? (as Keith suggested above).

If you have any logs from the Operating System during that 18:30 run time frame, please forward to me. (also any agent logs from snapcreator).

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

I reviewed the source code Remote.pm and my suspicion is that the agent is not returning it's version properly.

It could be that agent was not running during that time of the clone.

Are you manually restarting the agent every time after this error occurred?  or does it automatically work in the next run?

Also, please check if you can put the IP instead of the hostname in the below line, and let it run for few days.

From

SC_CLONE_TARGET=taurus:9090

To

SC_CLONE_TARGET=x.x.x.x:9090

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

I never restart the agent. I just rerun the clone process and it works. I have changed SC_CLONE_TARGET to the IP address.

Re: Getting random timeout on PRE_CLONE_CREATE_CMD01

Thank you. Please keep monitoring this for a week, and let us know if this still fail.

I will follow up on this within my team as well.

Thanks,

Siva Ramanathan