Active IQ Unified Manager Discussions

dfm run cmd hanging all dfm commands

madden
6,944 Views

I tried to use “dfm run cmd filer01 aggr show_space > file.out” at a customer DFM server (ancient 4.0 by the way) and it hung the cmd prompt. Now if I run any “dfm” command, even “dfm help” it outputs that previous command that was issued and then hangs that new cmd prompt.  From host diag web view I see that RSH and SSH are both disabled on the original host, so I tried enabling RSH, and now host diag shows success via RSH, but that stuck job remains stuck.   I already rebooted the DFM server but the behavior hasn’t changed; it seems to remember it wanted to run that job.

Now that my job has hung things other commands (like rsh checking to vfiler quotas, dfmrotatelogs, etc) look to be hanging too.

I checked the dfm logs but didn’t find any messages that were helpful.

Has anyone seen this before and know how to cancel a CLI hung job?  I checked in the web GUI under that host hoping to be able to cancel it but it's not shown in the running list.

Thanks,

Chris

6 REPLIES 6

kryan
6,944 Views

Chris,

The command "dfm run" MAN page entry (copied below) will show you the available sub-commands.

Have you tried the status and/or delete commands yet?

Kevin

dfm run
This command manages the execution of remote commands.

dfm run attempt [ -t timeout ] [ -f ] [ job-id ... ]

Trigger an attempt to run the listed jobs.

The optional timeout argument sets a maximum time limit (in seconds) for the command to complete on the host. This timeout overrides the timeout set by dfm run submit. The optional -f flag asks the DataFabric Manager server to force execution of the job even if the reschedule time has not been reached. The optional job-id arguments will limit the job attempts to only the listed jobs. If no job-id is given, all jobs will be attempted. If the -t timeout is triggered on a particular host, that part of the job will be marked as ``failed'' (i.e.: it will not be rescheduled).

dfm run cmd [ -t timeout ] [ -R retry-count ] [ -r ] { host | group | - } command ...

This is the simplest way to run a command on one or more hosts.

The optional timeout argument sets a maximum time limit (in seconds) for the command to complete on the client. The default is 0 (wait forever). The -R retry-count parameter can be used to optionally specify the number of times job should be retried in case of errors. The optional -r flag runs the command via the host's hostRLMAddress using the ssh protocol instead of using the hostPrimaryAddress and the default protocol. When - is used instead of a host or a group causes the command to read standard input for a list of hosts to execute the command on. The command is the command string to execute on the host. If command matches an alias name, then the command string is substituted by the sequence of commands defined in the alias definition.

dfm run delete [ -k ] job-id ...

Delete a pending or running job.

If you try to delete a currently running job, a warning message will be printed and the job will not be deleted. The -k option will force the job to be killed and deleted in these situations.

dfm run status [ job-id ... ]

Show the status of the requested jobs. If no job-id arguments are given, then status will be listed for all jobs.

dfm run submit [ options ] [ -a ] [ -r ] { host | group | - } command ...

The options are

-j job-id: normally the command chooses a job-id automatically; this option allows you to specify a particular job ID.

-t timeout: set a maximum limit in seconds for the command to complete on the remote host. The default is 300 seconds.

-a: attempt to run the job immediately.

-r: attempt to run the job on the host's hostRLMAddress using the ssh protocol.

dfm run wait [ -t timeout ] job-id

Wait until the job identified by job-id is complete. Wait forever or for timeout seconds.

dfm run alias import [ -d ] { path | - }

Import alias definitions from the specified path or standard input. All alias definitions are imported in the single transaction. If there is an error in importing, no new definition will be imported and old definitions will be preserved. If path is specified, alias definitions will be read from the file named path. If - is specified then alias defintions are read from the standard input.

See FORMAT FOR IMPORTING ALIAS DEFINITIONS for more information about the format of the alias definition file.

-d: delete all old alias definitions before importing the new ones.

dfm run alias export { path | - }

Export alias definitions to the specified path or standard output. If path is specified, alias definitions will be written into the file named path. If - is specified then alias defintions are written to the standard output.

dfm run alias list [ object ]

List all aliases available to the client. The object can be a group or a host. If object is specified, then list all aliases executable on the given object. If object is not specified, list all aliases that client can execute on any host.

dfm run alias reset

Delete all alias definitions.

kryan
6,944 Views

Also be aware that for future commands that the command "dfm run cmd" has an option "-R" to specify retires for that job before giving up and it is not specified on the CLI help.

madden
6,944 Views

Thanks, but any dfm command, even just "dfm help", prints out the command I originally ran (“dfm run cmd filer01 aggr show_space 1> file.out”) and then proceeds to hang itself.  So there is no chance to use the CLI to kill the job, and the GUI doesn't appear to display the job either.

Maybe / hopefully when I return to the customer tomorrow it will have timed out, but if not, is there any way to remove this entry from the DB so that on restart of the server it won't try to launch it again?

Thanks,
Chris

kryan
6,944 Views

There is a jobs table in the database, but it appears to only list the jobs submitted through the UI (at least on my server). 

I recommend opening a support case and attaching a DFMDC output to start (hopefully this will complete) so that we can resolve the condition.

Kevin

adaikkap
6,944 Views

See if you can kill the dfm.exe and restart the server, which can solve the problem.

Regards

adai

madden
6,944 Views

Hi,

Wow.  Mystery solved and boy do I feel dumb.  I had created a file "dfm.bat" in that directory with a bunch of dfm command lines in it (including the first one that hangs because RSH/SSH hasn't been configured for some hosts).  Each time I run "dfm version" or the like it is actually running dfm.bat and passing the parameters as arguments.  Duh.

Thanks for the responses and sorry for wasting your time!

Thanks,

Chris

Public