Hi Sen,
I am able to resolve this issue by applying some workaround but the fact remains that "na_server_open()" stuck if called while cluster is rebooting.
Previously my code logic is :
---------------------------------------------------
while() {
open server connection using "na_server_open()" and get server context.
setting timeout on server connection using "na_server_set_timeout()".
Using server context to read latency, fpolicy status e.t.c.
closing the server context using "na_server_close()".
}
---------------------------------------------------
Workaround in my code logic to resolve the issue :
---------------------------------------------------
while() {
if (global server context is not initialized) {
open server connection using "na_server_open()", get server context, store it is some global space to use it in next iteration.
setting timeout on server connection using "na_server_set_timeout()".
}
Using global server context to read latency, fpolicy status e.t.c.
}
closing the global server context using "na_server_close()" when it is not required or application exiting.
---------------------------------------------------
Using this workaround chances of "na_server_open()" encountered during reboot is very rare, but the fact remains that na_server_open() stucks when called during cluster reboot.
Now answers to your questions are:
Yes, I am calling "na_server_open()" in each iteration.
Yes, I am using "na_server_close()" at the end of each iteration to free the memory allocated by na_server context.
(1) v.5.1 is the version of NMSDK used.
(2) Host Plateform : 64 bit Windows Server 2008, 6 GB RAM, Intel(R) Xenon(R) CPU (2 processors)
(3) v.8.2 is the version of ONTAP.
(4) I have a small VS project, code that will help you produce and understand the issue. But let me know how can I share it with you? I hope above logic will give you some little idea.
(5) Number of threads does not matters. Issue produces even with a single main thread.
WinDbg's thread stack when "na_server_open()" hang, if it helps :
--------------------------------------------
0 Id: 3f34.2250 Suspend: 1 Teb: 000007ff`fffdd000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`0021dd48 000007fe`fd7d0555 ntdll!NtWaitForSingleObject+0xa
01 00000000`0021dd50 000007fe`fd7d295e mswsock!_GSHandlerCheck_SEH+0x4269
02 00000000`0021ddd0 000007fe`ff532a7c mswsock!_GSHandlerCheck_SEH+0x776a
03 00000000`0021dec0 00000001`80016e4c WS2_32!recv+0x13c
04 00000000`0021df60 00000001`80002a15 zephyr!shttpc_read+0x8c
05 00000000`0021e3f0 00000001`80002ae7 zephyr!http_free_url+0x1c5
06 00000000`0021e430 00000001`80003af4 zephyr!http_strip_headers+0x37
07 00000000`0021ec90 00000001`80003c51 zephyr!http_open_url_socket+0x6d4
08 00000000`0021f5c0 00000001`80003f66 zephyr!http_post_request_ex+0x91
09 00000000`0021f610 00000001`800082eb zephyr!http_post_request+0x26
0a 00000000`0021f660 00000001`800093a3 zephyr!na_child_add_int+0x87b
0b 00000000`0021f780 00000001`80009407 zephyr!na_server_invoke_elem+0xf3
0c 00000000`0021f7b0 00000001`3fde175d zephyr!na_server_invoke+0x27
0d 00000000`0021f7e0 00000001`3fde1936 latencytest!server_open_conn(struct mx_cmd_s * cmd = 0x00000000`008167f0, char ** errstr = 0x00000000`00000279)+0xbd [c:\anshul\workspace\mywork\latencytest\latencytest\ltntap.c @ 52]
0e 00000000`0021f830 00000001`3fde1528 latencytest!ntap_open_vserv_conn(struct mx_cmd_s * cmd = 0x00000000`00000000)+0x56 [c:\anshul\workspace\mywork\latencytest\latencytest\ltntap.c @ 160]
0f 00000000`0021f860 00000001`3fde1fe2 latencytest!main(int argc = 0n0, char ** argv = 0x00000000`00000001)+0xb8 [c:\anshul\workspace\mywork\latencytest\latencytest\latencytest.c @ 260]
10 00000000`0021f8c0 00000000`7755f56d latencytest!__tmainCRTStartup(void)+0x11a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 582]
11 00000000`0021f8f0 00000000`77b53281 kernel32!BaseThreadInitThunk+0xd
12 00000000`0021f920 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
--------------------------------------------