Unable to attach the datastore using the NFSv4.1 protocol for ESXi6, ONTAP version cDOT8.2
I have cluster with two nodes and SVM is enabled with NFSv4 and NFSv4.1 is enabled. Created the volume and a export policy. Now i am trying to add the datastore to the ESXi6.0 using the NFSv4.1. Throws the below error in the vcenter server
An error occurred during host configuration. Operation failed, diagnostics report: Sysinfo error on operation returned status : Timeout. Please see the VMkernel log for detailed error information
Checked the VMkernel log on the ESXi server and below is the snippet.
2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)World: 15446: VC opID 31680480-165a-4557-ada1-3be15742d33f-21187-ngc-df-8a-2a79 maps to vmkernel opID 5d6b2dd2 2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)NFS41: NFS41_VSIMountSet:402: Mount server: xx.xx.xx.xx, port: 2049, path: /vol_svm1_nfs4, label: testinggg-nfs4, security: 1 user: , options: <no 2015-10-13T13:19:58.511Z cpu3:138178 opID=5d6b2dd2)StorageApdHandler: 982: APD Handle Created with lock[StorageApd-0x4304273a7130] 2015-10-13T13:20:13.514Z cpu2:129143)WARNING: SunRPC: 3947: fail all pending calls for client 0x4302ddb7e2a0 (socket disconnected) 2015-10-13T13:20:18.512Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41FSWaitForCluster:3433: Failed to wait for the cluster to be located: Timeout 2015-10-13T13:20:18.512Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41_FSMount:4412: NFS41FSDoMount failed: Timeout 2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)StorageApdHandler: 1066: Freeing APD handle 0x4304273a7130  2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)StorageApdHandler: 1150: APD Handle freed! 2015-10-13T13:20:18.513Z cpu0:138178 opID=5d6b2dd2)WARNING: NFS41: NFS41_VSIMountSet:410: NFS41_FSMount failed: Timeout
Could not understand from the log what the problem is. Need help in resolving the issue.
I managed to get it all working and have been able to deploy vSIMs over and over from scripted deployments and get it working every time.
The keys were:
1. Make sure each NFS LIF on the SVM has it's own unique SPN. Do not use the same same SPN for mulitple LIFs if you want ESXi to mount NFS datastores from both LIFs. If they have the same SPN then the first LIF you mount via, the ESXi host must use that same LIF for every other mounted datastore, so no way to balance datastores across LIFs. I guess the same SPN on every LIF is required for pNFS, but it really screws up ESX.
2. Make sure the DNS Host (A) record and reverse lookup for your LIF IPs match one of the SPNs for your LIF. I didnt really understand this when I was trying to set this up.
3. Make sure you allow KRB5 and KRB5i in your NFS export policies, but you MUST also include "sys" as an allowed method for superuser. If you leave superuser at "any" in the export policy you will be unable to clone any VM on an NFS 4.1 datastore using kerberos. You can create VMs, delete, power on, snapshot. But cloning fails. Took me a while to work out the cause of that one.