2016-08-18 11:02 AM
I've installed OCUM 6.x family many times so I have most of the install/setup concepts down. However, this is my first on Windows. This is in a lab, as a VM, which has 12GB memory and 4-cores. I believe I had to install it twice, maybe three times to get all VM requirements in place so I admit that some of the MSI install attempts failed (hung for hours) and the VM had to knocked down and restarted. But the final MSI install seemed to go according to plan and reported success.
So I'm adding my first cluster. The process seemed to start fine, I filled out the form for hostname/cluster_mgmt name, user, pswd and it told me the cluster presented a self--signed certificate which I said was acceptable. It then appeared in the list showing discovery in progress.
In a nutshell I'm in a deadlock where OCUM still reports discovery 'in progress' (for over a day), will not let me delete the cluster entry (because discover is in progress) and I see little or no evidence of discovery activity going on. I've looked at a lot of log files and attached those which might be giving me clues, I'm out of ideas. I'd love to delete the cluster and start over but see no way to do that.
- firewall on the ocum-server (called admin-um) has been completely shutoff
- this same cluster has been used with prior 6.x OCUMs for months
- pings work in both directions (ocum->cluster, cluster->ocum) using IP-address, simple-DNS-name, and FQDN
- this morning I noticed the service called ocie-au was not running ... I started it with no apparent effect
- I notice that Windws Task-Manager _/Services\_ tab services related seem very different that 'Services' app services ... is there documentation or explanation anywhere?
- I rebooted the ocum server
Any ideas on things to look for or things to try?
2016-08-19 06:36 AM
The ocie-au(acquisition unit) service has to be **up** for discovery/monitoring of clusters to go through.
You can follow below steps to remove and re-add the cluster:
*Ensure ocuieau is running.
*Open new command prompt and run 'um cli login -u <maint_username> -p <maint_userpassword>'
*Then run a 'um datasource list' to get the Object ID of cluster stuck in discovery phase and run a 'um datasource remove <obj_id>'(this will take care of deleting all DB foreign links too).
*Then logout using 'um cli logout'. Now re-add the cluster in web GUI clusters grid.
2016-08-22 08:23 AM - edited 2016-08-22 08:33 AM
Thanks Kirana, that has helped me try some more experiments but still dead in the water with no discover.
Some more backgorund and troubleshooting data:
- wiped the previous VM and started with a new clean Win2012R2 VM
- based on a OCUM 7 release note, I turned of IPv6 entirely on the OCUM server, pings are all not IPv4 based (even localhost pings show 127.0.0.1)
- re-installation of OCUM 64p1 hung the system once (high IO rate overloaded the hardware), re-installed and got a successful install)
I've now attempted to add a cluster 2 or 3 times, each time ensuring ocuie-au was running, and after um datasource deleteing the prior cluster attempt.
I've studied log files carefully and while not understading the internal architecture, the only thing I find of interest is this snippet from au.log coinciding with a recent ocie-au start (8/22 @ 9:48'ish)and a new cluster add
This sounds like in internal failure setting up an ssl based session to port 20443?!?
Anyone out there have a successful instance of OCUM64p1 on a Windows server?
2016-08-22 09:49:29,761 INFO [Thread-2] com.onaro.sanscreen.acquisition.framework.Main (Main.java:156) - Shutting down by external request
2016-08-22 09:52:45,491 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.Main (Main.java:61) - Main - Starting acquisition...
2016-08-22 09:52:45,522 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.ServiceManager (JBossLoggerAdapter.java:156) - Importing general preferences from file:/C:/Program%20Files/NetApp/essentials/au/conf/preferences/general.properties
2016-08-22 09:52:45,585 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService (FrameworkService.java:79) - Asynchronous Task Manager - Initialized!
2016-08-22 09:52:45,585 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService (FrameworkService.java:79) - Logs Manager - Initialized!
2016-08-22 09:52:45,585 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:156) - Attempting to verify server at: https://localhost:20443
2016-08-22 09:52:45,913 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:165) - SSL handshake failure for https://localhost:20443
2016-08-22 09:52:45,913 INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:148) - Server up!
2016-08-22 09:52:45,975 ERROR [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:248) -
com.onaro.sanscreen.acquisition.framework.FrameworkException: Software caused connection abort: recv failed
I've attached the most recent log files which seem to be being updated as discovery is attempted.