We are running CDOT 9.3P2 with an SVM providing NFS v4 services.
We are running in to problems with NFSv4 clients creating a huge number of locks, to the point where the SVM slows to a crawl, and eventually stops responding. Even Powershell get-nclocks becomes unresponsive, at some unknown point. We just can't seem to finger where that point is.
Our current plan is to migrate to CDOT 9.3P9; Has anyone else seen these issues? I have read the NFSv4 RFC's and I understand that it leaves a lot to the client, but we are committed to NFSv4 due to RHEL 7, and need to stabilize our CDOT cluster.
We have seen quite a number of client side issues causing the NFS store pool exhaustion on NetApp end which eventually leads to NFSv4 outage .
I would recommend to open a support case so that we can gather the store pool counters stats , relevant lock information and a trace at the time of the issue to identify if it is a client side or server side problem.
To resolve the NFSv4 store pool exhaus issue, we can first try to migrate the LIF that is affected to another node. if that doesn;t resolve try rebooting the server and if the locks are still in pending-delete state, we might need to reboot the node.
you can get the details of the servers causing the higest number of NFSv4 locsk with the help of below commsnd.
ssh admin@node-that-impacted "set d -c off; rows 0; vserver locks nfsv4 show -inst" | tee locks.txt | grep -i "client name" | sort | uniq -c |sort -n