Subscribe

Data collection issues with OPM 2.0

Hi,

 

I constantly get the following messages for one of my clusters in OPM 2.0:

 

"Cluster cuba (cuba) is unreachable. Performance Manager is no longer monitoring this cluster."

or

"Unable to consistently collect from Cluster cuba (cuba)."

 

It generates a "Cluster not reachabsle" event in my OCUM 6.3 instance which is marked "obsolete" shortly after when the next acquisition interval occurs. It then re-occurs over and over. The performance graphs look pitted.

 

The cluster management network and the ESX servers hosting OPM are connected to the same switch. Other clusters in the same environment and subnet as "cuba" run without issues.

The only difference is ONTAP versions. cuba: 8.3 all others 8.3.1RC1

 

Any idea how to further troubleshoot? Which logs to check and how as the vApp is locked down pretty well...

 

Thanks and regards, Niels

 

Re: Data collection issues with OPM 2.0

The 'unable to consistently collect' error is thrown if 5% or more of collections in the last 24 hours are failing.  Things to check are if the cluster is too busy (check number of monitored instances are within supported limits) and if the VM has enough CPU and memory resources (vs. documented requirements).  The reason could hopefully be found in the acquisition logs.  These are available in autosupport (trigger from the GUI) or the support bundle (request from VM console).   For NetApp and partner staff they can then view the ASUP in smartsolve.

 

Hope that gives you an idea of next steps...

 

Cheers,

Chris Madden
Storage Architect, NetApp EMEA

Re: Data collection issues with OPM 2.0

Hi,

 

I have the same issue at a customer site. This customer have deployed a new OPM 2.0 VM with the default specs (4 vCPU and 12GB memory). The cluster added is a new cluster of 4 nodes. We only see the message for this new 4-node cluster (cDOT 8.3.1) and not for the older cluster (with cDOT 8.2.3).

 

The new cluster is not busy and most of the time idle at this moment. 

 

We cannot find anything wrong in the logs?

 

Regrads,

Marco