I have a classic "The cluster xxxx (adress IP) is not reachable through OPM or OPM performance collection failed. Cluster xxxx performance events have stopped being sent by Performance Manager server yyyy. This condition could be caused by a communication issue between Unified Manager and Performance Manager, or between Performance Manager and cluster xxxx.
I have too :
Pairing Status (Bad): The credentials provided by Performance Manager for the user with Event Publisher role are either incorrect or Performance poll is not complete or has failed.
Well I am sure of nothing, for now, but all seems work, after all. Including monitoring, event... I must precise we monitore only one node, with 2 head, A and B. Not two, in the case of this plateform. It is not really a cluster of several nodes. And I didn't install Unified or performance manager, myself. So I do not know the application. In "Management" "Users", I see 0 user added. Additionnally, about it, the "Add" user is greyzed. I couldn't add anyone, even if I wanted to do it. And the certificate SSL is ok from 2017 to 2022. The "performance" tab, "Events" show me a column "Event State" "Obsolete" but the Detected time is correct...
Do you have any idea why I have these messages, from that ? If the OPM performance fail, I guess I shouldn't have datas correctly updated. Or it miss some user in the configuration ?
What version of OCUM and OPM have you running? Seems to be rather old if those are two seperate servers.
OCUM and OPM have been joined into a single server with OCUM 7.0. The stand-alone versions went EOS in January.
Looks like you should upgrade, or if you don't care about historical data too much, re-install.
The minimum recommended version would be 7.3. You would need several interim steps to go from whatever version you have to 7.3, many of which are EOA/EOS and could only be optained through a special process.
as you run OCUM/OPM 7.1 it's not as old as I initially thought.
Nonetheless 7.1 will become EOS end of the year. So you can still open a support call to get your issue corrected.
In the past when I observed similar communications issues, they were caused by inconsistent IDs in the OCUM and OPM database so that the clusters could not be matched netween the two.
Support has a procedure to get those corrected in case it's the actual cause.
Please have it investigated by NetApp support.
Once cleared, I suggest to upgrade.
You would need to go through OCUM 7.2, which is the first version OCUM and OPM have been merged into a single server. OCUM 7.2 contains a workflow to import historical performance data from your existing 7.1 OPM.
Once done, plan to upgrade to at least 7.3.
You can potentially just upgrade directly if you don't care about historical performance data.
If upgraded to 7.2 and later, OCUM will immediately start to collect perf data from your clusters, including up to the past 13 days from the ONTAP performance archiver files.
AS OCUM and OPM are merged, there is no communication between the servers that could fail.
1. We check with Netapp support to help to fix that.
2. Once done, we upgrade to 7.2. Then OCUM will automatically migrate datas on itself from OPM.
3. We upgrade to 7.3.
If we can't check with Netapp, we can upgrade OCUM, anyway. But, in this case, OCUM will keep his datas, while OPM will lost them and will work from scratch.
I will search if there is some process specifific to upgrade OCUM in 7.2, on Internet.
One question : OCUM can work without OPM ? If we stop OPM and if we keep OCUM alive, whithout change his version, what will it occurs ? We will just have a new message of error ? Or must we change the configuration on OCUM to permit the application to work alone ?
Well, I am not sure to be able to open a ticket to Netapp because even if the filer is supported by Netapp, it doesn't seems the case with OCUM/OPM. I tried to do it, some days ago and the service replied by saying we have to pay to have an answer...