Re: How to solve NFS v4 Problem: Nblade.nfsV4PoolExhaust: NFS Store Pool for Client exhausted

ubimet · ‎2019-06-12

Hi,

we have a problem with one of our filers. It is a two node cluster with 8060 nodes.

We are getting from time to time theser errors:

Event: Nblade.nfsV4PoolExhaust: NFS Store Pool for Client exhausted. Associated object type is CLUSTER_NODE with UUID: qqqqqqq.
Message Name: Nblade.nfsV4PoolExhaust
Sequence Number: 12361461
Description: This message occurs when one of the NFSv4 store pools is exhausted.
Action: If the NFS server is unresponsive for more than 10 minutes after this error occurs, contact NetApp technical support.

It seems, that this error doesn't cause any issues, but I would like to get rid of it. You never know, when it would cause an outage.

As I understand this error correctly, it looks like, that there are no client connections left on that node. Is that correct?

Yes, we have several hundreds of VMs, using several different volumes on this node. On the other node, there are lesser client connections...

What is the best way to debug this?

Thank you very much!

BR Florian

donny_lang · ‎2019-06-12

This KB article:

https://kb.netapp.com/app/answers/answer_view/a_id/1017079/~/nfsv4-file-access-fails-due-to-locking-storepool-exhaustion-

explains the behavior and how to start troubleshooting the issue, including which data to collect for support should you choose to engage them for the issue.

ubimet · ‎2019-06-12

Hi,

thank you very much!

I have also found this article, but I was not logged in! I only saw the symptoms, but nothing more!

ubimet · ‎2019-06-13

Hi,

I was able to debug this issue, but I don't have a solution for it.

Maybe someone can explain me, how it comes to so much client connections for NFSv4?

    Counter                                                     Value
    -------------------------------- --------------------------------
    storePool_ByteLockAlloc                                         0
    storePool_ByteLockMax                                     1024002
    storePool_ClientAlloc                                       65356
    storePool_ClientMax                                        102401

At this time, I have only seen 5 IPs, which have a lock on that node:

ssh xxx@192.168.1.99 "set d -c off; rows 0; vserver locks nfsv4 show -inst" | tee locks.txt | grep -i "client name" | sort | uniq -c |sort -n
      1              Client Name: 0.0.0.0/192.168.29.11 tcp NULL 0
      1              Client Name: Linux NFSv4.0 192.168.20.132
      1              Client Name: Linux NFSv4.0 192.168.20.148
      1              Client Name: Linux NFSv4.0 192.168.26.116
      2              Client Name: Linux NFSv4.0 192.168.20.109
      5              Client Name: Linux NFSv4.0 192.168.26.141

Do all these clients create several connections to that node? Maybe, one for every file they want to access?

I can't move that lif to another node, like the solution is stating. I think also, that it is not a bug of the software of the clients.

Clients are linux VMs with NFS kernel client...

Does maybe NFSv4.1 has more client connections available?

BR Florian

donny_lang · ‎2019-06-13

Yes, NFSv4 clients will create multiple/many connections to a node based on how many files they are accessing. Do any of those clients that you listed have an overwhelmingly large amount of NFSv4 locks compared to the other clients? The KB has some additional information about how to view the Lease Count (objects owned for a single client) for NFSv4 clients).

Answering that question will help to concentrate the troubleshooting efforts.

Noorain02 · ‎2021-06-17

We had similar issue on our NetApp C-Dot systems with NFS v4.0 but after moving on NFS v4.1. it did not appear from last week. We could not even find a clear indication or any statement from NetApp Technical support that this is a fix however after moving to NFS v4.1 this error is gone.