Subscribe
Accepted Solution

OnCommand 5.0 becomes unusable due to "can't connect to host (err=10055)"

I've been running OnCommand 5.0 in my lab for well over 6 months now with no issues. However, over the past week my OnCommand 5.0 server is acting very erratic. All of the consoles and the NMC start throwing errors and the product becomes un-useable, yet all of the DFM services report that they're running. Stopping and restarting the DFM services does not resolve the issue. To-date, the only issue that seems to work is rebooting the entire server. That clears the issue until it occurs again a few days later. All of the filers being monitored by DFM appear to be working fine. No events in their messages file and no problems connecting to them via SSH. Once I reboot the DFM server, it has no problem connecting to them for monitoring and data collection - until the issue pops-up again a few days later.

The commonly recurring error message is err=10055 for which I cannot find any information on the NOW site or in the DFM FAQ. I see this error in the DFMmonitor.log file and I also get this when attempting to login to the NMC.  Looking at the DFM server log files, I see this error message is occuring for each and every host I'm attempting to monitor.

DFMmonitor.log example:

Feb 24 10:03:45 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6080a:/StoragexServicexExample: Can't connect to host (err=10055).

Feb 24 10:03:46 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/datastore2: Can't connect to host (err=10055).

Feb 24 10:03:46 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab3070drb:/vol0: Can't connect to host (err=10055).

Feb 24 10:03:46 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040b:/db2arch: Can't connect to host (err=10055).

Feb 24 10:03:47 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6080a:/wmoradata4: Can't connect to host (err=10055).

Feb 24 10:03:47 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/perm_nsvm_p2_source: Can't connect to host (err=10055).

Feb 24 10:03:47 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab3070drb:/SnapManager_20111218184540909_largedb_oradata2: Can't connect to host (err=10055).

Feb 24 10:03:47 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040b:/db2temp: Can't connect to host (err=10055).

Feb 24 10:03:48 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6080a:/selab2_vmfs_lab6080a_perm: Can't connect to host (err=10055).

Feb 24 10:03:48 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/datastore3: Can't connect to host (err=10055).

Feb 24 10:03:48 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6040b:/db2logs1: Can't connect to host (err=10055).

Feb 24 10:03:48 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/OraAppsu01: Can't connect to host (err=10055).

Feb 24 10:03:49 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6040b:/pw_oralogs1: Can't connect to host (err=10055).

Feb 24 10:03:49 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/oracle_software: Can't connect to host (err=10055).

Feb 24 10:03:49 [DFMMonitor:ERROR]: [1800:0x10c4]: userquota:lab6040a:/freescale_source: Can't connect to host (err=10055).

Feb 24 10:03:50 [DFMMonitor:ERROR]: [1800:0x8a8]: userquota:lab6040a:/software: Can't connect to host (err=10055).

There are only two changes I've made to this server recently.  One, I installed a lightweight graphical SNMP browser, but its not running and doesn't run as a background service.  Two, I added a large number of performance thresholds to my Performance Advisor environment.

Does anyone know what err=10055 refers to?  I have a sneaky suspicion I'm running out of some resources on my DFM server, but I appear to have plenty of RAM and CPU available when this error occurs.

DFM Server log files are attached.

Re: OnCommand 5.0 becomes unusable due to "can't connect to host (err=10055)"

More info:

I'm running OnCommand 5.0 64-bit on a Windows 2008 R2 64-bit guest in vSphere 4.  4 GB of RAM, 2 vCPUs. I have 20% disk space free in my filesystem where OnCommand 5.0 is installed. RAM never exceeds 50% used, and my CPUs are rarely busy. This environment has been working great up until the past week or so and I can't figure out exactly why the sudden change in behavior.

Re: OnCommand 5.0 becomes unusable due to "can't connect to host (err=10055)"

Looks like I hit BURT 536261.  To get around this, I have increased the number of dynamic ports on my Windows server and so far, everything appears to be working well.  I've thrashed Performance Advisor by viewing weeks and weeks worth of counter data and it appears to be working fine. If I encounter this issue again, I'll post it here.

C:\> netsh int ipv4 set dynamicportrange protocol=tcp startport=5000 numberofports=60536

Re: OnCommand 5.0 becomes unusable due to "can't connect to host (err=10055)"

Hi Reid

I've the exactly same problem with OnCommand Core.

Thanks for your solution, i will try this.

Best regards

Reto