I constantly get the following messages for one of my clusters in OPM 2.0:
"Cluster cuba (cuba) is unreachable. Performance Manager is no longer monitoring this cluster."
"Unable to consistently collect from Cluster cuba (cuba)."
It generates a "Cluster not reachabsle" event in my OCUM 6.3 instance which is marked "obsolete" shortly after when the next acquisition interval occurs. It then re-occurs over and over. The performance graphs look pitted.
The cluster management network and the ESX servers hosting OPM are connected to the same switch. Other clusters in the same environment and subnet as "cuba" run without issues.
The only difference is ONTAP versions. cuba: 8.3 all others 8.3.1RC1
Any idea how to further troubleshoot? Which logs to check and how as the vApp is locked down pretty well...
The 'unable to consistently collect' error is thrown if 5% or more of collections in the last 24 hours are failing. Things to check are if the cluster is too busy (check number of monitored instances are within supported limits) and if the VM has enough CPU and memory resources (vs. documented requirements). The reason could hopefully be found in the acquisition logs. These are available in autosupport (trigger from the GUI) or the support bundle (request from VM console). For NetApp and partner staff they can then view the ASUP in smartsolve.
I have the same issue at a customer site. This customer have deployed a new OPM 2.0 VM with the default specs (4 vCPU and 12GB memory). The cluster added is a new cluster of 4 nodes. We only see the message for this new 4-node cluster (cDOT 8.3.1) and not for the older cluster (with cDOT 8.2.3).
The new cluster is not busy and most of the time idle at this moment.