I'm wondering if anyone else is having an issue with SYM CLI data sources failing? We are experiencing an issue where we restart, achieve acquisition success, and then a day or two later, "Failed to execute Sym CLI symcfg". For whatever reason, the two SYM data sources we have continue to fail. I've been searching around KB, trying to find some leads. Is there break fix out there that I'm not seeing? Or perhaps a bug in the system? We recently upgraded to 6.4.1 but we had this issue prior to the upgrade. It just happens much more frequently now.
Thanks and Happy Thanksgiving!
I've seen situations where the SYM data source would acquire successfully once and fail on all subsequent attempts. I believe this was ultimately an issue with the symapi server. On many occasions, rebooting the symapi server was necessary to get acquisition to run. Have you tried this?
The process I'd follow to troubleshoot this is something like:
1. Find a failure in the logs (SANscreen\acq\log; look at acq.log for general stuff and foundation_<datasource name>_<big number>.log for specifics) and determine what command it's failing on. It may also be failing prior to issuing any commands, or while inserting results into the database; this would be important.
2. Run the failing command at the CLI, get it to work on the CLI
3. See if the issue is resolved
If you open a case, Support can help you with this process.
Just to go a bit further, when I look at the foundation logs for this particular data source, the only information on a failure I see would be the below string from a foundation log - does this say anything to you?
SYMxxx [Failed to execute external utility] - Failed to execute Sym CLI symcfg ([Device name General Device]: Return code : 1)
com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException: Failed to execute Sym CLI symcfg
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Ok, you got a return code of 1, which is not 0. 0 is good, non zero is bad. So EMC's CLI is telling you a bad thing happened.
You want to crack open the recording:
You have been in the root folder, most likely, reading the log file, which would have had the message you posted (you may have gotten that from the OCI gui).
We need to drill into the numerical folder (it is the # of milliseconds since the Unix Epoc).
In that folder, are all the raw data and processed data (if any)
You want to read your files that have symcfg in their name - open them, read them, find something wrong.
You may want to do this by working backwards, chronologically.
Yes, there is something wrong with the remote connection to the server. Issue with the port? Firewall problem? Unfortunately the EMC guys are a bit sparse due to the holiday, and I'll have to follow-up with them on Monday. I just don't understand what would be causing such a regular failure recently, but not in the past. Something changed that they aren't telling me or aren't aware of. New software version, new hardware...something has changed. Hopefully I get to the bottom of it because the data is now quite old.
OCI and EMC Symmetrix 101.
OCI wants to discover EMC Symmetrix arrays by parsing XML formatted Symcli (Solutions Enabler CLI) output.
How does it get that data:
#1. 95% of the time, the OCI server or acquisition unit is acting as a client - Solutions Enabler is a client server program. In this mode you need:
The OCI server to be running the same, or older version of Solutions Enabler as the server you are talking to
The storsrvd daemon needs to be installed and running on the Solutions Enabler server you are talking to (please make your customer install this daemon, and set to run autostart so when they reboot their SE server, OCI doesn't have inventory fail because the storsvrd daemon is no longer running)
#2. 5% of the time customers provision FC gatekeeper volumes to OCI, or run an OCI Remote Acquisition Unit (RAU) on an Solutions Enabler host (with direct FC connectivity to arrays) - in this config, we choose "local" under "Connection Caching" in the datasource, as we are running commands directly against the array. I have been working with a customer this week that does almost all their Symmetrix discovery this way - they are building 8GB VMs with RDM gatekeepers. They have the SMI-S enabled version of Solutions Enabler installed, and we use symcli for inventory, and SMI-S for performance gathering. It is also worth mentioning in this config, the OCI datasource still needs to have the "service name" attribute populated, but its contents are irrelevant, as the netcnfg alias file is not used for local communication.
So, lets continue with #1, the client server model.
We established you need Solutions Enabler installed. You also need to edit your netcnfg file , in ..\emc\symapi\config
This defines aliases, which is the first field on the Inventory page in your OCI datasource creation.
Your OCI datasource also needs the right path to the symcli binaries set. Finally, the default connection caching setting, REMOTE_CACHED, should work fine
What can go wrong?
The storsvrd daemon is not running on the host we defined in our netcnfg alias - to troubleshoot from OCI, open your netcnfg file, extract the IP address or FQDN of the SE host, and try:
telnet NNNNN 2707 , where NNNNN is the IP/FQDN
Assuming telnet is installed - if this times out, there is nothing listening on tcp 2707, or there is a firewall. If you get a blinking cursor, the daemon is up.
Once you know the status of the daemon, you either work on getting the daemon running (and please work on getting installed autostart ), or troubleshooting your problem..
Troubleshooting is probably another post. There are 2 threads for it:
Cracking open recordings, and reading them
Setting environment variables manually, and running commands manually. This approach has been degraded somewhat in value due to some Symmetrix security configurations that mean your output (run as you) may vary from what OCI (running as hostname\SYSTEM) will see
I'm going to share this with the EMC engineers to see if any of this has been done. We never set up an RAU, it's strictly local.
I did attempt to telnet the IP on port 2707. Initially, a blank screen with a non-blinking green cursor popped up. I tried with another IP, and a similar screen popped up, and went away almost immediately.
Very good to know. We have restarted the symapi server multiple times, we get a success, then a failure again. We'll have to investigate a bit further - it used to be successful without fail, but it's been doing this intermittent success/fail/success/fail for over a month.
Thank you for your help!
We have never seen this happen before, ever.
You want to crack open the recordings - the foundation.X...zip in ../sanscreen/acq/log and determine what command is failing
Is your customer sending OCI ASUP?
Ha! Good to know we are blazing new failure trails here.
At the moment, the customer isn't using ASUP. Let me take a look at the logs, and I may need to put in a ticket. I was just curious if there was a common issue and an "easy" fix.