I've installed OCUM 6.x family many times so I have most of the install/setup concepts down. However, this is my first on Windows. This is in a lab, as a VM, which has 12GB memory and 4-cores. I believe I had to install it twice, maybe three times to get all VM requirements in place so I admit that some of the MSI install attempts failed (hung for hours) and the VM had to knocked down and restarted. But the final MSI install seemed to go according to plan and reported success.
So I'm adding my first cluster. The process seemed to start fine, I filled out the form for hostname/cluster_mgmt name, user, pswd and it told me the cluster presented a self--signed certificate which I said was acceptable. It then appeared in the list showing discovery in progress.
In a nutshell I'm in a deadlock where OCUM still reports discovery 'in progress' (for over a day), will not let me delete the cluster entry (because discover is in progress) and I see little or no evidence of discovery activity going on. I've looked at a lot of log files and attached those which might be giving me clues, I'm out of ideas. I'd love to delete the cluster and start over but see no way to do that.
- firewall on the ocum-server (called admin-um) has been completely shutoff
- this same cluster has been used with prior 6.x OCUMs for months
- pings work in both directions (ocum->cluster, cluster->ocum) using IP-address, simple-DNS-name, and FQDN
- this morning I noticed the service called ocie-au was not running ... I started it with no apparent effect
- I notice that Windws Task-Manager _/Services\_ tab services related seem very different that 'Services' app services ... is there documentation or explanation anywhere?
Thanks Kirana, that has helped me try some more experiments but still dead in the water with no discover.
Some more backgorund and troubleshooting data:
- wiped the previous VM and started with a new clean Win2012R2 VM
- based on a OCUM 7 release note, I turned of IPv6 entirely on the OCUM server, pings are all not IPv4 based (even localhost pings show 127.0.0.1)
- re-installation of OCUM 64p1 hung the system once (high IO rate overloaded the hardware), re-installed and got a successful install)
I've now attempted to add a cluster 2 or 3 times, each time ensuring ocuie-au was running, and after um datasource deleteing the prior cluster attempt.
I've studied log files carefully and while not understading the internal architecture, the only thing I find of interest is this snippet from au.log coinciding with a recent ocie-au start (8/22 @ 9:48'ish)and a new cluster add
This sounds like in internal failure setting up an ssl based session to port 20443?!?
Anyone out there have a successful instance of OCUM64p1 on a Windows server?
2016-08-22 09:49:29,761 INFO [Thread-2] com.onaro.sanscreen.acquisition.framework.Main (Main.java:156) - Shutting down by external request
2016-08-22 09:52:45,491 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.Main (Main.java:61) - Main - Starting acquisition...
2016-08-22 09:52:45,522 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.ServiceManager (JBossLoggerAdapter.java:156) - Importing general preferences from file:/C:/Program%20Files/NetApp/essentials/au/conf/preferences/general.properties
2016-08-22 09:52:45,585 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:156) - Attempting to verify server at: https://localhost:20443
2016-08-22 09:52:45,913 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:165) - SSL handshake failure for https://localhost:20443
2016-08-22 09:52:45,913 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:148) - Server up!