ONTAP Discussions

Using NFS 4.1 and VMWare vsphere - lock issue

ARUM

Hi,

After a controller failover due to hardware issue, some of our Virtual Machines was stopped. NetApp and VMware KBs seem to lead to avoiding NFS 4.1. 

NetApp KB : VMware NFSv4.1 based virtual machines are powered off after ONTAP 9 storage failover events

VMware KB : 2089321

According to VMware, this problem is due to the release of the lock because the grace period established by the nfs server is lower than the takeover duration (if i correctly understand these bulletins). This is our case. The takeover took more than 96 seconds while the lock is maintained up to 90 sec (tr-4067).

For NetApp (KB), if we need high availability, we should avoir NFS 4.1. For VMware (KB), it works as it should, so don't wait for a resolution. 

 

Am i the only one to use vSphere with NetApp/NFS4.1 ? I have more than 400 VMs hosted on NFS 4.1 datastore. Rather than migrating to NFSv3 datastore, could I increase lock grace period ? Which value ?

1 ACCEPTED SOLUTION

TMAC_CTG

This is a known issue. It really has not been recommended to use NFSv4.1 with ESXi, especially with versions before 7.0. Lots of bad/weird things happen due to incompatibilities in implementation. ONTAP uses Parallel NFS (pnfs) and I think the term used by VMware is session trunking.  The articles go into details about the nuances.

 

Anyway, two options....

(1) Upgrade ESXi to the latest version of 7.x *and* update the to the latest version of the NVS/VAAI VIB for ESXi *and* upgrade ONTAP to ONTAP 9.8. Not sure if there would be any painless way to get there with NFS v4.1 though. I would not be sure which order to recommend. Maybe ESXi, then the VIB, then ONTAP? 

(2) Create a bunch of NFSv3 datastores and Storage vMotion everything off the v4 mounts and then unmount/delete the v4 volumes when complete. When all the volumes are evacuated and unmount, you may even want to disable nfsv4 on the ONTAP ESXi NFS SVM.

 

I have seen way too many customers hit this and the v3 migration is by far the easier fix.

View solution in original post

5 REPLIES 5

TMAC_CTG

This is a known issue. It really has not been recommended to use NFSv4.1 with ESXi, especially with versions before 7.0. Lots of bad/weird things happen due to incompatibilities in implementation. ONTAP uses Parallel NFS (pnfs) and I think the term used by VMware is session trunking.  The articles go into details about the nuances.

 

Anyway, two options....

(1) Upgrade ESXi to the latest version of 7.x *and* update the to the latest version of the NVS/VAAI VIB for ESXi *and* upgrade ONTAP to ONTAP 9.8. Not sure if there would be any painless way to get there with NFS v4.1 though. I would not be sure which order to recommend. Maybe ESXi, then the VIB, then ONTAP? 

(2) Create a bunch of NFSv3 datastores and Storage vMotion everything off the v4 mounts and then unmount/delete the v4 volumes when complete. When all the volumes are evacuated and unmount, you may even want to disable nfsv4 on the ONTAP ESXi NFS SVM.

 

I have seen way too many customers hit this and the v3 migration is by far the easier fix.

View solution in original post

I will avoid vSphere 7/Ontap 8/NFS 4.1. I don't want another migration if there are new issues. And the KB note applies to  "VMware ESXi 6 and higher"

.  So I will use NFSv3. Even if there is still big problem with backup  (https://kb.vmware.com/s/article/2010953).

Ontapforrum

Another NetApp KB: (This issue has been resolved with the release of ESXi 6.7P02 and ESXi 7.0 GA)
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/VMware_NFSv4.1_datastores_see_disruption_during_failover_events_for_ON...

 

As stated in previous response, lot of customers have seen this issue, I believe you might find long forum discussion threads on this issue but it says it's a VMware defect ?

Thank you, i know this KB note, we updated last year as soon as a fix was published.

TMAC_CTG

Like the kb says you do not have to use nfsv4 

 

Alternatively, you can use LAN/NBD transport that uses the NFC (Network File Copy) in your backup solution or disable SCSI hot-add through the backup software

 

i have had customers use this method and seems to stop the stunning

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public