Network and Storage Protocols

NFS v4 infinite locks and / or memory leaks on CDOT 9.3P2

Tas
5,574 Views

We are running CDOT 9.3P2 with an SVM providing NFS v4 services.

We are running in to problems with NFSv4 clients creating a huge number of locks, to the point where the SVM slows to a crawl, and eventually stops responding.  Even Powershell get-nclocks becomes unresponsive, at some unknown point.  We just can't seem to finger where that point is.

 

Our current plan is to migrate to CDOT 9.3P9;  Has anyone else seen these issues?  I have read the NFSv4 RFC's and I understand that it leaves a lot to the client, but we are committed to NFSv4 due to RHEL 7, and need to stabilize our CDOT cluster.

TasP

1 ACCEPTED SOLUTION

Tas
5,205 Views

Actually, I may have good news.  We have upgraded to CDOT 9.3P8 about four weeks ago.  All NFS v4 issues appear to have been resolved.  Now if we can just get FG support for NFS v4.  😉

 

Tas

View solution in original post

4 REPLIES 4

Vijay_ramamurthy
5,433 Views

Hi Tas,

We have seen quite a number of client side issues causing the NFS store pool exhaustion on NetApp end which eventually leads to NFSv4 outage . 

I would recommend to open a support case so that we can gather the store pool counters stats , relevant lock information and  a trace at the time of the issue to identify if it is a client side or server side problem. 

 

 

 

Tas
5,206 Views

Actually, I may have good news.  We have upgraded to CDOT 9.3P8 about four weeks ago.  All NFS v4 issues appear to have been resolved.  Now if we can just get FG support for NFS v4.  😉

 

Tas

Dileep
5,088 Views

We are running 9.1P2. we have been facing the same issue intermittently. there seems to be a bug that hits the filer port.

here is the BUG detail:

 

https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1141614

 

To resolve the NFSv4 store pool exhaus issue, we can first try to migrate the LIF that is affected to another node. if that doesn;t resolve try rebooting the server and if the locks are still in pending-delete state, we might need to reboot the node.

 

you can get the details of the servers causing the higest number of NFSv4 locsk with the help of below commsnd.

 

ssh admin@node-that-impacted  "set d -c off; rows 0; vserver locks nfsv4 show -inst" | tee locks.txt | grep -i "client name" | sort | uniq -c |sort -n

Sree_B
5,083 Views

Thanks Dileep, we were also facing the same issue, it has helped us enormously

Public