ONTAP Discussions

what mean outage during takover for NFSV4

grocanar
2,323 Views

Hi

i m using ontap 9.11 and considering migratin from NFSV3 to NFSV4.

I m carefully reading the following document 

NFS in NetApp ONTAP
Best practice and implementation guide

https://www.netapp.com/media/10720-tr-4067.pdf

 

in page 64 it talk about potential outage and takevoer.
When it s written during a failover 
Lock state is not moved; up to 45s
outage when locks are in use.

Does that mean that all the Netapp is out of reachable up to 45 seconds or just the file that the lock is put on. 
It s not exactly the same in term of impacts. 

1 ACCEPTED SOLUTION

Ontapforrum
2,006 Views

@heightsnj: To be very honest, Admins who have seen/experienced migrations from NFSv3 to NFSv4/4.1, and then seen this issue first hand, will be able to shed more useful information. There are plenty of information around NFSV4/4.1 (mostly about key enhancements over previous versions), but the real-test is probably when the failover/reboot/ happens at the NFS server side (NetApp Storage). In earlier NFS versions (v2&3) NLM/NSM took care of the locking issues.

 

Coming to the Burt, disruptions caused by this combination "NFSv4.1" and "VMware ESXi 6.x+ datastores" specifically,  I see there are Kbs that suggests it has been FIXED. It appears to be VMware bug that incorrectly handled the responses from the Server.

 

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/VMware_NFSv4.1_datastores_see_disruption_during_failover_events_for_ON...


https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Top_ESXi_known_issues%2F%2Fworkarounds%2F%2Fbest_practices_for_NFSv4.1

 

Further, looking at the bug :1499441 (You shared) - ESXi clients lose NFS 4.1 locks during takeover (Updated:12-Feb-2023), it sounds like this has been fixed from ONTAP end as well or may be. I would suggest, raising a ticket so that you can track this issue directly and get an update.

 

On the side note: I was reading the Burt (1499441) you shared, it says NFS4.1 in the subject, but in the summary it is referencing 'NLM', I thought NLM was only applicable to NFSv2/3? So this bit is confusing to me. I don't know the whole context of this particular bug though.

View solution in original post

7 REPLIES 7

MaGr
2,288 Views

Hi,

 

the locking mechanism for NFS4 has changed. Now the file remains locked until timeout.

If there is no urgent need, I would not change to NFS4.

 

Marcus

heightsnj
2,185 Views

>> Now the file remains locked until timeout.

when timeout (45s) reached, the lock on the file got released/lost, and therefore the outage happened. Is this why outage could happen? 

Ontapforrum
2,264 Views

Yes, this is a file-locking under lease-based-model. It's not NetApp that is unreachable, rather the grace-period (30seconds Lease + 15seconds Grace = 45 seconds total) given to clients to re-claim their locks. As the total time-out value is 45 seconds, you are expecting the Server/LIF to be up already in best case scenario.

 

In general - NFSv4 locks provide a time-bounded grant of control over file state to an NFS client. Holding a lease allows a client to assume that its lock remains valid for a server-specified, renewable time interval.

 

Following Kbs may be worth considering:


How does the NFSv4 grace periods work?
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/How_does_the_NFSv4_grace_periods_work

 


Why may NFSv3 perform better than NFSv4.x?
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Why_may_NFSv3_perform_better_than_NFSv4.x

 

heightsnj
2,186 Views

Yeah, we have a lot of discussions about converting v3 to v4 as well. It is indeed confusion.

 

As we've discussed in this thread, if a client has a lock on a file and a failover(either on LIF's or nodes) exceeded 45s, the client may lost the connection to NetApp NFS. 

 

However, people on the other side saying the issue got resolved by this link:
https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1499441

 

Can this link really resolved the issue we have discussed?

heightsnj
2,056 Views
 
 

1499441 - ESXi clients lose NFS 4.1 locks during takeover

Bug ID
1499441
Title
ESXi clients lose NFS 4.1 locks during takeover
Status
Fixed
Severity 
 
  •  
  •  
  •  
  •  
  •  
  •  
P2
Found In Versions 
 
 
9.11.1P2, 9.8P14, 9.9.1, 9.9.1P2, 9.9.1P6
Last Update
12-Feb-2023
Summary
When the deny mode bit is set to "lock object", network lock manager rejects multiple attempts to reclaim the locking state with the share lock conflict error, 102.
Fixed In Versions
For minimum recommended release, please visit SU2
9.11.1P7, 9.12.1, 9.12.1P1, 9.9.1P14
 
Workaround
None
Created On
26-Aug-2022
Notes
None

Ontapforrum
2,007 Views

@heightsnj: To be very honest, Admins who have seen/experienced migrations from NFSv3 to NFSv4/4.1, and then seen this issue first hand, will be able to shed more useful information. There are plenty of information around NFSV4/4.1 (mostly about key enhancements over previous versions), but the real-test is probably when the failover/reboot/ happens at the NFS server side (NetApp Storage). In earlier NFS versions (v2&3) NLM/NSM took care of the locking issues.

 

Coming to the Burt, disruptions caused by this combination "NFSv4.1" and "VMware ESXi 6.x+ datastores" specifically,  I see there are Kbs that suggests it has been FIXED. It appears to be VMware bug that incorrectly handled the responses from the Server.

 

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/VMware_NFSv4.1_datastores_see_disruption_during_failover_events_for_ON...


https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Top_ESXi_known_issues%2F%2Fworkarounds%2F%2Fbest_practices_for_NFSv4.1

 

Further, looking at the bug :1499441 (You shared) - ESXi clients lose NFS 4.1 locks during takeover (Updated:12-Feb-2023), it sounds like this has been fixed from ONTAP end as well or may be. I would suggest, raising a ticket so that you can track this issue directly and get an update.

 

On the side note: I was reading the Burt (1499441) you shared, it says NFS4.1 in the subject, but in the summary it is referencing 'NLM', I thought NLM was only applicable to NFSv2/3? So this bit is confusing to me. I don't know the whole context of this particular bug though.

MaGr
2,001 Views

Hi,

 

we are on NFS3 and never had issues with the locking of files as described in the table of TR4067 page 64. (up to 45s on NFS3).

Does it also will run well with v4? The facts are too unclear for me to migrate.

 

Marcus
Public