2015-10-13 06:30 AM
Unable to attach the datastore using the NFSv4.1 protocol for ESXi6, ONTAP version cDOT8.2
I have cluster with two nodes and SVM is enabled with NFSv4 and NFSv4.1 is enabled. Created the volume and a export policy. Now i am trying to add the datastore to the ESXi6.0 using the NFSv4.1. Throws the below error in the vcenter server
An error occurred during host configuration.
Operation failed, diagnostics report: Sysinfo error on operation returned status : Timeout. Please see the VMkernel log for detailed error information
Checked the VMkernel log on the ESXi server and below is the snippet.
2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)World: 15446: VC opID 31680480-165a-4557-ada1-3be15742d33f-21187-ngc-df-8a-2a79 maps to vmkernel opID 5d6b2dd2
2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)NFS41: NFS41_VSIMountSet:402: Mount server: xx.xx.xx.xx, port: 2049, path: /vol_svm1_nfs4, label: testinggg-nfs4, security: 1 user: , options: <no
2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)StorageApdHandler: 982: APD Handle Created with lock[StorageApd-0x4304273a7130]
2015-10-13T13:20:13.514Z cpu2:129143)WARNING: SunRPC: 3947: fail all pending calls for client 0x4302ddb7e2a0 (socket disconnected)
2015-10-13T13:20:18.512Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41FSWaitForCluster:3433: Failed to wait for the cluster to be located: Timeout
2015-10-13T13:20:18.512Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41_FSMount:4412: NFS41FSDoMount failed: Timeout
2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)StorageApdHandler: 1066: Freeing APD handle 0x4304273a7130 
2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)StorageApdHandler: 1150: APD Handle freed!
2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41_VSIMountSet:410: NFS41_FSMount failed: Timeout
Could not understand from the log what the problem is. Need help in resolving the issue.
2015-10-20 01:40 PM
Re-create the error and then generate an ASUP with type "all" and PM me the SN/sysid so I can have a look.
::> autosupport invoke * -type all
If you can get a packet trace during the issue, even better.
A few questions:
- What is the mount syntax?
- Are you mounting via VSC?
- What data LIF are you trying to mount?
- What volume?
- Does the mount work from regular Linux clients via v4.1 (ie RHEL)?
- Does the mount work via NFSv3 in ESX?
Once I have the ASUP I can try to piece together what (if any) config issues you might have and review logs.
2017-09-28 07:36 PM
Did you ever find a solution to this problem?
I am having the exact same issue with an ONTAP 9 Cluster and ESXi 6.5 using NFSv4.1 and Kerberos Authentication via our Active Directory.
I can authenticate via Kerberos from the ESXi host using "kinit (username)", but mounting an NFSv4.1 datastore returns the same error you saw:
WARNING: SunRPC: 3948: fail all pending calls for client 0x430383376140 (socket disconnected).
Was there a way to get around this?
2017-10-02 01:28 AM
Hi Chris, unfortunately there isn't full support for NFS 4.1 yet, e.g. no multipathing, no mixing of protocols on the ESXi host...
VMware vSphere with ONTAP: http://www.netapp.com/us/media/tr-4597.pdf page 17 does a good job of advising how to connect as well as the supported features.
Hopefully this can help.
2017-11-02 08:22 PM - edited 2017-11-02 08:23 PM
I managed to get it all working and have been able to deploy vSIMs over and over from scripted deployments and get it working every time.
The keys were:
1. Make sure each NFS LIF on the SVM has it's own unique SPN. Do not use the same same SPN for mulitple LIFs if you want ESXi to mount NFS datastores from both LIFs. If they have the same SPN then the first LIF you mount via, the ESXi host must use that same LIF for every other mounted datastore, so no way to balance datastores across LIFs. I guess the same SPN on every LIF is required for pNFS, but it really screws up ESX.
2. Make sure the DNS Host (A) record and reverse lookup for your LIF IPs match one of the SPNs for your LIF. I didnt really understand this when I was trying to set this up.
3. Make sure you allow KRB5 and KRB5i in your NFS export policies, but you MUST also include "sys" as an allowed method for superuser. If you leave superuser at "any" in the export policy you will be unable to clone any VM on an NFS 4.1 datastore using kerberos. You can create VMs, delete, power on, snapshot. But cloning fails. Took me a while to work out the cause of that one.