We have noticed within VMware (version is 5.5. update 3 with NFS datastores) that during a controller failover (i.e. testing or Data Ontap upgrade) that a few datastores experience the All Paths Down (APD) event for about 10 seconds.
NFS.MaxQueueDepth is set at 64 currently.
The APD events only occur on controller failover and not during normal operations.
Has anyone experienced this on controller fail overs, and has anyone had success with eliminating APD events by dropping the NFS.MaxQueueDepth to 32 or less ?
Don't see any APD's during a lif migrate operation
Network components look all good
We have found in the logs that during the failover the NFS service is taking about 5 seconds to startup on the active node. We are going to drop the MaxQueueDepth to 32 on an ESXi host and test failover again.
I also had the same - tickets open with Netapp and VMware
Basically it came down to the failover time between controllers. Sometimes the failover time was quick (in this case we didn't experience any APD) in other times it was slightly slower (depending on how busy the controller was at the time) which led to an APD. Basically there is was no guarantee that the failover was going to occur faster or slower at the time. Really depends or comes down to how busy the controller is. You can see the failover times for individual protocols in ::> event log show -event *nfs* (after failover)
Out of interest what controllers are you running, and what's their utilization like ?