Subscribe

Troubleshooting an OCUM 6.4P1 on Windows install/setup

I've installed OCUM 6.x family many times so I have most of the install/setup concepts down. However, this is my first on Windows. This is in a lab, as a VM, which has 12GB memory and 4-cores. I believe I had to install it twice, maybe three times to get all VM requirements in place so I admit that some of the MSI install attempts failed (hung for hours) and the VM had to knocked down and restarted. But the final MSI install seemed to go according to plan and reported success.

 

So I'm adding my first cluster. The process seemed to start fine, I filled out the form for hostname/cluster_mgmt name, user, pswd and it told me the cluster presented a self--signed certificate which I said was acceptable. It then appeared in the list showing discovery in progress.

 

In a nutshell I'm in a deadlock where OCUM still reports discovery 'in progress' (for over a day), will not let me delete the cluster entry (because discover is in progress) and I see little or no evidence of discovery activity going on. I've looked at a lot of log files and attached those which might be giving me clues, I'm out of ideas. I'd love to delete the cluster and start over but see no way to do that.

 

Other details.

  - firewall on the ocum-server (called admin-um) has been completely shutoff

  - this same cluster has been used with prior 6.x OCUMs for months

  - pings work in both directions (ocum->cluster, cluster->ocum) using IP-address, simple-DNS-name, and FQDN

  - this morning I noticed the service called ocie-au was not running ... I started it with no apparent effect

  - I notice that Windws Task-Manager _/Services\_ tab services related seem very different that 'Services' app services ... is there documentation or explanation anywhere?

  - I rebooted the ocum server 

 

Any ideas on things to look for or things to try?

Re: Troubleshooting an OCUM 6.4P1 on Windows install/setup

Hi dkorns,

 

The ocie-au(acquisition unit) service has to be **up** for discovery/monitoring of clusters to go through.

 

You can follow below steps to remove and re-add the cluster:

*Ensure ocuieau is running.

*Open new command prompt and run 'um cli login -u <maint_username> -p <maint_userpassword>'

*Then run a 'um datasource list' to get the Object ID of cluster stuck in discovery phase and run a 'um datasource remove <obj_id>'(this will take care of deleting all DB foreign links too).

*Then logout using 'um cli logout'. Now re-add the cluster in web GUI clusters grid.

 

Re: Troubleshooting an OCUM 6.4P1 on Windows install/setup

[ Edited ]

Thanks Kirana, that has helped me try some more experiments but still dead in the water with no discover.

 

Some more backgorund and troubleshooting data:

  - wiped the previous VM and started with a new clean Win2012R2 VM

  - based on a OCUM 7 release note, I turned of IPv6 entirely on the OCUM server, pings are all not IPv4 based (even localhost pings show 127.0.0.1)

  - re-installation of OCUM 64p1 hung the system once (high IO rate overloaded the hardware), re-installed and got a successful install)

 

I've now attempted to add a cluster 2 or 3 times, each time ensuring ocuie-au was running, and after um datasource deleteing the prior cluster attempt.  

 

I've studied log files carefully and while not understading the internal architecture, the only thing I find of interest is this snippet from au.log coinciding with a recent ocie-au start (8/22 @ 9:48'ish)and a new cluster add

 

This sounds like in internal failure setting up an ssl based session to port 20443?!?

 

Anyone out there have a successful instance of OCUM64p1 on a Windows server?

 

-----

2016-08-22 09:49:29,761  INFO [Thread-2] com.onaro.sanscreen.acquisition.framework.Main (Main.java:156) - Shutting down by external request

2016-08-22 09:52:45,491  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.Main (Main.java:61) - Main - Starting acquisition...

2016-08-22 09:52:45,522  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.ServiceManager (JBossLoggerAdapter.java:156) - Importing general preferences from file:/C:/Program%20Files/NetApp/essentials/au/conf/preferences/general.properties

2016-08-22 09:52:45,585  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService (FrameworkService.java:79) - Asynchronous Task Manager - Initialized!

2016-08-22 09:52:45,585  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService (FrameworkService.java:79) - Logs Manager - Initialized!

2016-08-22 09:52:45,585  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:156) - Attempting to verify server at: https://localhost:20443

2016-08-22 09:52:45,913  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:165) - SSL handshake failure for https://localhost:20443

2016-08-22 09:52:45,913  INFO [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:148) - Server up!

2016-08-22 09:52:45,975 ERROR [WrapperStartStopAppMain] com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager (JBossLoggerAdapter.java:248) - 

com.onaro.sanscreen.acquisition.framework.FrameworkException: Software caused connection abort: recv failed

at com.onaro.sanscreen.acquisition.framework.mgmt.CertificateDownloader.downloadIfNeeded(CertificateDownloader.java:131)

at com.onaro.sanscreen.acquisition.framework.mgmt.HttpBasedUrlConnection.downloadCertificate(HttpBasedUrlConnection.java:187)

at com.onaro.sanscreen.acquisition.framework.mgmt.HttpBasedUrlConnection.downloadCertificate(HttpBasedUrlConnection.java:178)

at com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager.setUpConnection(CommunicationManager.java:420)

at com.onaro.sanscreen.acquisition.framework.mgmt.CommunicationManager.doStart(CommunicationManager.java:133)

at com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService.start(FrameworkService.java:75)

at com.onaro.sanscreen.acquisition.framework.mgmt.ServiceManager.doStart(ServiceManager.java:209)

at com.onaro.sanscreen.acquisition.framework.mgmt.FrameworkService.start(FrameworkService.java:75)

-----

 

I've attached the most recent log files which seem to be being updated as discovery is attempted.