Network and Storage Protocols

LockCount rising until storepool says goodbye

Printus-Admins
46 Views

Hey folks,

 

i have a strange issue and i cant get rid of it.

 

Environment:

  • OKD-Cluster (4.20.0-okd-scos.12) 
  • NetApp OTS (NetApp Release 9.17.1P2) with dedicated SVM for the OKD Cluster
  • Trident, installed via Operator. Version: 25.6.2
  • ActiveMQ-Artemis Cluster, installed via Operator (Red Hat Integration - AMQ Broker for RHEL 9. Version: 7.13.2-opr-1+0.1761129569.p) using Trident-PVC for data 

 

ActiveMQ starts normally and is operating as expected but the "LockCount" and "OwnerCount" is rising steadily while all other counters stay low.

 

 

otscl::*> nfs storepool show -vserver mysvm

Node: otscl1
Vserver: mysvm
Data-Ip: 192.168.1.66

Client-Ip Protocol IsTrunked OwnerCount OpenCount DelegCount LockCount
-------------- --------- --------- ---------- ---------- ---------- ---------
192.168.1.67 nfs4.2 false 0 0 0 0
192.168.1.68 nfs4.2 false 26099 23 0 26099

 

 

When the Lock/OwnerCount hits ~131k, the following error appears:

 

otscl1 EMERGENCY Nblade.nfsV4PoolExhaust: NFS Store Pool for Owner exhausted. Associated object type is CLUSTER_NODE with UUID: XXXXXXXXXXXXXXXX.

From now on, all NFS4-Shares on the OTS-Cluster (all SVMs) cant be accessed anymore until we restart ActiveMQ which resets the counters.

 

I also checked the locks in detail. See:

 

locks show -vserver mysvm
  (vserver locks show)

Notice: Using this command can impact system performance. It is recommended
that you specify both the vserver and the volume when issuing this command to
minimize the scope of the command's operation. To abort the command, press Ctrl-C.

Vserver: mysvm
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/server.lock
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.66
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/serverlock.1
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.66
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/serverlock.2
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.66
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/bindings/activemq-bindings-1.bindings
                                   mysvm_lif
                                               nfsv4.1   delegation  192.168.1.66
                Delegation Type: write
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/bindings/activemq-bindings-2.bindings
                                   mysvm_lif
                                               nfsv4.1   delegation  192.168.1.66
                Delegation Type: write
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/activemq-data-1.amq
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.66
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/activemq-data-2.amq
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.66
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/server.lock
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.67
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/serverlock.1
                                   mysvm_lif
                                               nfsv4.1   byte-range  192.168.1.66
                Bytelock Offset(Length): 0 (18446744073709551615)
                                                         share-level 192.168.1.67
                Sharelock Mode: read_write-deny_none
         /trident_pvc_1e201be0_e6d6_4ab2_8270_579061df7f89/journal/serverlock.2
                                   mysvm_lif
                                               nfsv4.1   share-level 192.168.1.67
                Sharelock Mode: read_write-deny_none
trident_pvc_52530de0_c35d_4a2b_a133_d84b9ef2b9b7
         /trident_pvc_52530de0_c35d_4a2b_a133_d84b9ef2b9b7/.healthcheck
                                   mysvm_lif
                                               nfsv4.1   delegation  192.168.1.67
                Delegation Type: write
15 entries were displayed.

 

Running the AMQ-Operator and Trident on OpenShift (instead OKD), the counters will stay low, so i thought this could be a kernel or OS issue. 

 

  • OKD: 6.12.0-142.el10.x86_64 #1 SMP PREEMPT_DYNAMIC 
  • OpenShift: 5.14.0-427.97.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC

I installed CentOS Stream 10 (Kernel: 6.12.0-170) which is the OS for OKD and set the same kernel-params as in the cluster,

mounted the trident-share (copied mount-options from okd-node) and deployed AMQ-Artemis using the same config. The counters stay low.

 

While my research, i stumbled across the following comments:

 

By Chris at 2025-03-04 17:05:36:

A not so long while ago I managed to crash a NetApp filer by upgrading a Linux host to an early 6.x kernel and connecting with new NFSv4 features. Seems like its early for all implementors 🙂

By Benjamin Coddington at 2025-10-08 14:42:11:

Just ran across this post and thought it worth mentioning that as of v6.17 there have been over 1k patches to the in-kernel linux NFS client since v5.15 and 2021.

https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4KernelStateNotImpressed?showcomments

 

Do you have any hints, tweaks or ideas how i could further investigate this problem?

 

I already installed a nfs4-server on linux which runs without any problems, so my devs already think abour replacing our OTS-Cluster 😄

 

Thank you.

1 REPLY 1

parisi
17 Views

This tends to be a pretty common issue, as evidenced by the number of KB articles we have around it. 🙂

 

This one covers why a client might cause this problem:

https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/What_are_the_NFSv4_Storepools_why_do_they_exist

 

How can specific clients potentially cause problems?

  • In certain circumstances, clients do not close their OPENs in a way that the ONTAP node is expecting.
  • When this occurs, the client is unaware that it still has that OPEN allocated.
  • In this case, the server will not remove the OpenState object and the resource is never returned to the pool.
  • If this behavior continues, storePool exhaustion can occur as the client behavior orphans resources on the server.
  • Dumping nfsv4 locks can show which client is taking up all of the allocations in the storePool.
  • Once this client is restarted, ONTAP frees associated storePool resources associated with that client.

This one talks about how to troubleshoot/identify the offending client:

 

https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/How_to_identify_problematic_NFSv4_clients_consuming_storepool_resources

 

This one consolidates all the different links:

https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/NFSv4_Storepool

 

Basically, it's probably a client issue, but I would use the above information to gather data and verify. Then maybe see if restarting the client makes the issue go away. After that, you have ample evidence to convince your group to upgrade the clients. 🙂

Public