2015-09-09 05:44 AM
I have a problem with OPM 1.1 and a new CDOT cluster running ONTAP 8.2P2 which is located in another network. I also have an older cluster running 8.2.2P1 on the same network as OPM and it works fine.
Anyway, I opened the port TCP/443 from OPM towards the Netapp and after that the discovery process started to work. After few minutes though I started to get the following error message on the "Manage Data Sources" page: Communication problem with the cluster: xx.xx.xx.xx, Failed to download Archive files. Error: ',xx.xx.xx.xx: Download wait timed out after 240000 millisecods. ' on try 37 out of 37.
Today those error messages are gone but I still get this error message on OPM: "Cluster_name (xx.xx.xx.xx)" is unreachable. Performance Manager is no longer monitoring this cluster.
I've created own user account for this purpose and given it "ontapi" rights.
Is there some other port towards the Netapp that has to be opened?
2015-09-24 03:14 AM
I have exactly the same issues with my created user account with "ontapi" rights. It seems that the rights are not enough because if I use the admin-user everything works fine.
Does anybody have a hint what permissions must be added?
2015-10-07 01:51 PM - edited 2015-10-07 01:53 PM
There are two issues here:
We have a burt 889705 that tracks the issue for the logging message. It has recently been reopened and targeted for the next release.
The problem is, the log itself simply says it’s not able to collect stats from the cluster but does not specify why.
2. Primary issue: User privilege
In this case, the root cause could be what you and Felix are suspecting – the user does not have the proper privilege.
When you set up the Admin user, have you enabled BOTH the ontapi AND http? See attached screenshot for details.
If you continue to have problem, please let me know.
2015-10-08 02:59 AM - edited 2015-10-08 03:01 AM
Thank you Julia,
after adding the "http" permission to my opm user it works also with this user. In the last days of testing the OPM we sometimes had problems with reachability of the cluster, but this is only for some minutes and than everything works fine again. Maybe this has something to do with the burt? Where can I get the information of the burt, is there a public accessable documentation?
The message we get in this case: 11:24 AM, 8 Oct : Cluster xxxx (xxxx) is unreachable. Performance Manager is no longer monitoring this cluster.
Maybe you have an idea where this effect comes from.
2015-10-08 06:53 AM
BURT is NetApp's internal bug tracking system. I have increased the priority of this bug based on your feedback here, and below the content of that bug (burt):
This is a consequence of a series of customer cases involving the following error message in the jboss-logs-au.log and the OPM UI :
"Failed to Download Archives Files. Timed out after:240000 milliseconds on try 37 out of 37"
The current error message logged does not take into account the HTTP response code returned by the SPI server in ONTAP.
Enhancing the log messages to include this information along with the potential cause can help identify the root cause of the connection failure faster in the customer environment.
Now back to your system ... are performance stats getting collected at all? On the dashboard, can you see stats for IOPs, Latency, MBps etc.?
2015-10-08 11:30 PM
yes everything is getting collected. And if this problem occures the OnCommand Unified Manager dashboard shows a reachability risk of the cluster which needs a lot time to be updated. It tooks a lot of time to switch green again after the problem is solved, what is very disturbing. And if you click on the risk there is no event shown, maybe because the problem is allready solved again. Thats realy annoing, when you expect it to turn green immediately after the problem is solved.
2016-03-23 03:32 PM
I have same problem on OCP 2.0.0RC1 talking to CDOT8.3 Cluster. Status "Network Acccess_failure". Status Message "Communication Problem with the Cluster. Failed to Download Archive Files. Error 'clusternameownload wait timed out after:240000 milliseconds.' on try 37 out of 37. What does this mean? And how to solve it. I have other CDOTs running 8.3 and 8.3.1 and even 8.2.3 which are working fine with OPM. I dont see stats for this partcular cluster on OPM. No Iops, latency, utilization nothing... Please advise.
2016-05-05 08:25 AM
I now have this issue. The cluster was monitoring fine until I added in 2 new nodes (AFF8080's) that were on 8.3.2. I have since downgraded them to 8.3.1 so the cluster versions are now in sync but OPM V2.0.0 is saying it "Failed to download Archive files"
Any ideas anyone?
2016-08-02 10:01 PM
We also found that when you enable FULL integration between OCUM and OCPM, the OCUM credentials are then used for OCPM which *may* not have the right permissions (in our case, it didn't have HTTP access, only ONTAPI