Since upgrading to ONTAP 9.3, OCUM reports "Cluster not reachable" off and on. OCUM throws out a "cluster not reachable" alert (new) which immediately goes away (obsolete) every 10-15 minutes. Pinging the cluster shows that the mgmt interface is not even skipping a beat. OCUM says pairing status is bad. This has occured on two FAS systems now as soon as we upgrade from 9.1 to 9.3P2
OCUM 6.4P1 and OCUM 7.3 see the same behavior. (we are in OCUM version transition)
We saw this a lot for an OnCommand server that was monitoring a cluster that physically was far away. We opened a support case and support can modify a timeout value to allow for longer response times - once this was set the issue went away. You could try that, and at the very least they may be able to identify any other issues if there are any.
This became apparant when we upgraded to ONTAP 9.3P2 and i could no longer reach the system mgr interface via name but could reach it via IP. (This was because the DNS entry for the cluster mgmt lif included an underscore which is not permitted by 9.3 standards i guess)
I had this issue on multiple clusters. The issue turns out to be the routing tables. Apparently in 9.x you MUST have proper static routes. If you don’t you will experience this error. The way it was explained to me, in prior versions a packet coming in through a lif/rote, went out the same way. Now it only goes out vi a static route, unless there is a dynamic route setup.
Do you know the specific timeout value that was modified? We are experiencing similar issues with systems and are looking to modify timeout value, but not sure which options to use or which value range to use.