Solved: CIFS drive share consistently stops working after a week

RalphT · ‎2020-06-19

A Windows server 2016 has a drive mapped to a filer and after a week it disconnects. Like clockwork, about every 7 days, and it happens every time so I now have to do a pre-emptive reboot of the windows client once a week. Attempting to reconnect from the windows command line with a net use command gives the error message “System error 8 has occurred. Not enough storage is available to process this command”. A reboot always fixes it.

Searching for that error always gives the advice to increase IRPStackSize parameter, but I think that means on the target machine. and if the target machine was a Windows machine, not a NetApp. We did increase that on the client though, with no change.

This all started when the OnTap was upgraded in October 2019. I'll try to get from what to what, I don't have that info right now. Before that the same setup ran a year straight with no problems whatsoever.

Thanks for any help.

Mjizzini · ‎2020-07-09

I have seen this issue when the filer receives too many session requests from the same user on one TCP connection. you should see the following errors in ems.

Nblade.cifsMaxSessPerUsrConn:error]: Received too many session requests from the same user on one TCP connection

Nblade.cifsMaxSessPerUsrConnNotice:notice]: Received xxxx session requests, nearing the configured limit of

Corrective action: Inspect the application running on the client using this TCP connection. The client might be operating incorrectly due to the application running on it. Rebooting the client might also be helpful. In some cases, clients are operating as expected but require a higher threshold, which you can set using the (privilege: advanced) "cifs option modify -max-opens-same-file-per-tree" command.

The default setting is 800 and could not be enough to your requirements. If increasing to 2000 does not fix your issue, you will need to troubleshoot the rogue client causing the issues.

View solution in original post

Ontapforrum · ‎2020-06-19

You haven't provided Hardware/Software details about NetApp.

Could you tell us:

1) Filer Model

2) Ontap version

3) From which version/patch level you have upgraded in Oct, 2019

4) Is quota setup

5) can we know the volume size of CIFS

6) Date/Time on Filer

7) Date/Time on Win2016 (Time on AD and CIFS server should be with-in minutes)

However, the error you have mentioned points to - Client side (Windows Server). Was any patching or updates during that period that could have caused this ?

TMACMD · ‎2020-06-19

What about free space on the client side?

Maybe the C:\ volume is full and a reboot cleans up enough?

RalphT · ‎2020-06-22

The C: drive has plenty of free space. I did try to dive into desktop heap though, since that came up in some google searches, but it got too deep for me. There was a recommendation for a setting change, but since this is an important production server I am hesitant to mess with that.

I agree the problem is with the windows system, but it was apparently started off by some change in the ONTap from version to version. At one point I experimented with a different batch file to transfer files under a different user, and the system did act quite a bit differently, spawning many new processes that eventually required a reboot in more like three to five days. At least currently that does not happen, the number of processes stays nice and constant at a nice low 81. So I think there is a good chance it is related to the environment that a windows command file runs in and desktop heap.

RalphT · ‎2020-06-22

There was no patching or other updates at that time.

GidonMarcus · ‎2020-06-22

Hi.

Along with the info @Ontapforrum requested. for the next time it happens - I would very much like to see the windows systems/security event logs at the time, ONTAP EMS, and a packet trace (which I guess will be a problem sharing here).

One other thing. Is there any odd network devices in the path? WAN optimizer (Riverbed?), application aware firewall, DLP, some sort of non-standard VPN?

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

RalphT · ‎2020-06-23

Gidon, it will be tough to include those things, because for one reason, I always do preemptive reboots before the problem re-occurs.

There is a stateful firewall in between but it did not change and has never presented any problem like this before.

paul_stejskal · ‎2020-07-06

Can you take a planned outage like reboot on Saturday night then next week collect the data?

Without knowing what your controller is, it's hard to say more. If you have a serial # and timestamp, we can see if anything was logged in ASUPs.

RalphT · ‎2020-07-07

Paul,

I can let it stop working to try to get diagnostics on it so I'll try that.

Thanks.

Mjizzini · ‎2020-07-09

I have seen this issue when the filer receives too many session requests from the same user on one TCP connection. you should see the following errors in ems.

Nblade.cifsMaxSessPerUsrConn:error]: Received too many session requests from the same user on one TCP connection

Nblade.cifsMaxSessPerUsrConnNotice:notice]: Received xxxx session requests, nearing the configured limit of

Corrective action: Inspect the application running on the client using this TCP connection. The client might be operating incorrectly due to the application running on it. Rebooting the client might also be helpful. In some cases, clients are operating as expected but require a higher threshold, which you can set using the (privilege: advanced) "cifs option modify -max-opens-same-file-per-tree" command.

The default setting is 800 and could not be enough to your requirements. If increasing to 2000 does not fix your issue, you will need to troubleshoot the rogue client causing the issues.