Hi Folks,
Apologies if this isn't the right venue since this is probably more about OS admin than NetApp specifically, but we are using NetApp ONTAP for NFS NAS so I figured I'd try because the quality of response here has been great.
We've just had a minor planned network "blip" (5s-15s) that couldn't be avoided due to where the change was happening on the network core. I didn't think it would be a big deal but we still ended up having a ton of hung NFSv3 hard mounts that couldn't be resolved without rebooting the affected Linux systems.
The systems ranged from Debian 8 / CentoOS6 to Ubuntu 20.04. Mounts have been defined in fstab with varying options, from simply 'defaults', to '_netdev,nofail'. Some hosts weren't affected at all.
It's 2021 and to me it seems wild that a modern Linux OS can't gracefully recover from a 5-15s outage. Am I missing some tricks for getting more graceful recovery on these systems? Was there something I could do from the client end to get the mount to drop completely and remount?
Longer term I'm hoping to move clients over to a new SVM that enforces NFSv4.1+, but that's still a way off and I'll have to contend with NFSv3 for some time in this infra. Then again I'm not sure that changing the protocol version will help much - I'm not an NFS guru.
Thanks everyone!