Data Protection

SC 3.6 with PM sometimes "cannot connect to socket"

brauntvr2swiss
8,631 Views

Hi All

Yesterday We have updated snapcreator from 3.5 to 3.6.. Since this update sometimes the connection between sc and pm fails with error:

Fri Aug 31 09:50:39 2012] TRACE: ZAPI RESULT

<results status="failed" reason="in Zapi::invoke, cannot connect to socket" errno="13001"></results>

[Fri Aug 31 09:50:39 2012] ZAPI:  (code = )

I think its not a configuration issue, because under 3.5 the Jobs runs without problems and the in 3.6 you see that the communication/Backup starts:

########## Getting Protection Manager backup progress ##########

[Fri Aug 31 09:47:37 2012] INFO: Getting Protection Manager backup progress for job-id 47756

[Fri Aug 31 09:47:37 2012] INFO: Protection Manager backup progress get for job-id 47756 completed successfully

[Fri Aug 31 09:47:37 2012] INFO: Protection Manager backup for job-id 47756 is running, Sleeping 1 minute

[Fri Aug 31 09:48:37 2012] INFO: Getting Protection Manager backup progress for job-id 47756

[Fri Aug 31 09:48:38 2012] INFO: Protection Manager backup progress get for job-id 47756 completed successfully

[Fri Aug 31 09:48:38 2012] INFO: Protection Manager backup for job-id 47756 is running, Sleeping 1 minute

[Fri Aug 31 09:49:38 2012] INFO: Getting Protection Manager backup progress for job-id 47756

[Fri Aug 31 09:49:38 2012] INFO: Protection Manager backup progress get for job-id 47756 completed successfully

[Fri Aug 31 09:49:38 2012] INFO: Protection Manager backup for job-id 47756 is running, Sleeping 1 minute

[Fri Aug 31 09:50:38 2012] INFO: Getting Protection Manager backup progress for job-id 47756

[Fri Aug 31 09:50:39 2012] ZAPI:  (code = )

Someone knows about this Problem or have had the same issue with 3.6?

TIA

Thomas


10 REPLIES 10

ktenzer
8,576 Views

Haven't heard of this issue but in 3.6 there is an NTAP_TIMEOUT parameter used for zapi calls, it defaults to 60 seconds so maybe if things are slow network-wise it could timeout.

Before SC simply waited forever for ontap or dfm now there is a configurable timeout. Maybe try setting this. If parameter isnt set the default is 60 seconds.

Still it sounds like you are having a network issue, why would the TCP connection get broken?

Keith

brauntvr2swiss
8,576 Views

Hi Keith

Thanks for your Quick Response.. We have increase the NTAP_TIMEOUT=1800, but its not working...

This zapi error comes on different Steps. Somethimes at Step:

"##########Creating Protection Manager Backup Version##########(In this case Job in PM doesn't begin and the SC get red)...

sometimes at step:

########## Getting Protection Manager backup progress ########## (In this case Job in PM runs successfully, but the SC get red)...

It could be a timeout, but its not the NTAP_TIMEOUT. The Config which not working anymore is the biggest in our environment (62 Primary Volumes)

regards,

Thomas

brauntvr2swiss
8,576 Views

....And the timeframe is different. fe:

Time from ########## Checking Protection Manager dataset snapcreator_AllDBs ########## till zapi Error 32s

Time from ########## Checking Protection Manager dataset snapcreator_AllDBs ########## till zapi Error 3m17s

I think i go now back to 3.5 to Test, wether its runs without problems.....

ktenzer
8,576 Views

This is strange behavior unfortunately I really dont have enough info to help further.

If you provide detailed instructions on how to reproduce problem we can try and reproduce issue, otherwise there isnt much more I can do.

Keith

brauntvr2swiss
8,576 Views

Thanks anyway for your response... I have  downgraded to Version 3.5 and my patient runs now fine.

regards

Thomas

brauntvr2swiss
8,576 Views

Hi Keith

Should I open a case about this Problem... Because 3.5 runs fine now since a few days. So its a Problem in the 3.6 Version, I think..

regards

Thomas

ktenzer
8,576 Views

Sure but we need to know how to reproduce this problem so the exact configuration. If we could reproduce it we would have fixed it already so something is different about your environment. I definitely recommend NGS case as that is best way to get engineering team involved.

Code-wise there really is no difference, we didnt even touch this code in 3.6 so it is very strange it works in 3.5 and not in 3.6. The fact that the problem is also intermittent in 3.6 leads me to believe it isnt an issue with 3.6 but rather something in the environment. I would expect you to see this problem in 3.5 too.

Can you please give us steps to reproduce, config files used, and anything else you can think of? If you document here I can give it to QA to see if they can reproduce. If we cant reproduce problem we cant fix it, that simple.

Regards,

Keith

ktenzer
8,576 Views

SC 3.6 has a bug that any schedules you have wont run. The workaround is to update schedules manually and set the start date to current date, then save schedules and restart server.

You can go to SC 3.6P1 which fixes above issue so schedules will run without doing anything.

If you are having issues upgrading then work with NGS but SC 3.5 has several scheduler bugs so if this are painful and you cant upgrade just dont use schedules, use cron, task manager or whatever else exists.

Regards,

Keith

ktenzer
8,576 Views

I should add in SC 3.6 only schedules created in previous versions that you upgrade wont run, new schedules created will run fine.

Keith

brauntvr2swiss
7,763 Views

Hi All

FYI:

Since Yesterday we running snapcreator 3.6p1 with oncommand 5.1. Till a few Minutes all looks good, but now a Job has failed with the error "in Zapi::invoke, cannot connect to socket" errno="13001"></results>" :-(..

the scheduler work like expected (No Scheduled Jobs get lost ) ... Now I have to open a Case....

regards

Thomas

Public