Interesting issue we have seen during ONTAP upgrades in OSP environment. During the upgrade from 9.3 to 9.5 we have seen across multiple clusters in multiple locations random OSP compute nodes that end up with "NFS not responding" and VM's hang. Only way OSP team has found to recover is to reboot the compute nodes which impacts all VM including those still running fine. We have only seen this behavior during upgrades and only on OSP compute nodes. Other, what I will call "normal", NFS or CIFS shares for files are not impacted at all. There is even evidence that VMware datastores from NFS had no impact. Only OSP compute nodes. Other interesting fact is that we can't reproduce this in any way other than during the upgrade. We have had to do failovers for dimm replacements and did multiple tests doing manual LIF migrations and never saw a single issue. Only during the upgrade. Anyone else seen this? What did you find to be the cause?
Note: I can't share much detail on our env due to security, so I have to keep it generalized here.