[slurm-users] 'srun hostname' hangs on the command line

John Hearns hearnsj at googlemail.com
Tue Jul 17 01:56:57 MDT 2018


Ronan, sorry to ask but this is a bit unclear.

Are you unable to launch ANY sessions with srun?
In which case you need to look at the logs to see why the job is not being
scheduled.

Is it only the hostname command which fails?

I would guess very much you have already run an ssh into a node and run the
hostname command manually.



On 17 July 2018 at 09:50, Buckley, Ronan <Ronan.Buckley at dell.com> wrote:

> Yes I do.
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Williams, Gareth (IM&T, Clayton)
> *Sent:* Tuesday, July 17, 2018 12:33 AM
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] 'srun hostname' hangs on the command line
>
>
>
> Do you get the same problem as a non-root user?
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com
> <slurm-users-bounces at lists.schedmd.com>] *On Behalf Of *Buckley, Ronan
> *Sent:* Tuesday, 17 July 2018 12:53 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] 'srun hostname' hangs on the command line
>
>
>
> Hi All,
>
>
>
> Verbose mode doesn’t show much.
>
> I hashed out the hostnames.
>
> Any ideas/suggestions?
>
>
>
> *# srun hostname*
>
> *^Csrun: interrupt (one more within 1 sec to abort)*
>
> *srun: task 0: unknown*
>
> *^Z*
>
> *[1]+  Stopped                 srun hostname*
>
> *#*
>
>
>
> *# srun -v hostname*
>
> *srun: defined options for program `srun'*
>
> *srun: --------------- ---------------------*
>
> *srun: user           : `root'*
>
> *srun: uid            : 0*
>
> *srun: gid            : 0*
>
> *srun: cwd            : /root*
>
> *srun: ntasks         : 1 (default)*
>
> *srun: nodes          : 1 (default)*
>
> *srun: jobid          : 4294967294 (default)*
>
> *srun: partition      : default*
>
> *srun: profile        : `NotSet'*
>
> *srun: job name       : `(null)'*
>
> *srun: reservation    : `(null)'*
>
> *srun: burst_buffer   : `(null)'*
>
> *srun: wckey          : `(null)'*
>
> *srun: cpu_freq_min   : 4294967294*
>
> *srun: cpu_freq_max   : 4294967294*
>
> *srun: cpu_freq_gov   : 4294967294*
>
> *srun: switches       : -1*
>
> *srun: wait-for-switches : -1*
>
> *srun: distribution   : unknown*
>
> *srun: cpu_bind       : default (0)*
>
> *srun: mem_bind       : default (0)*
>
> *srun: verbose        : 1*
>
> *srun: slurmd_debug   : 0*
>
> *srun: immediate      : false*
>
> *srun: label output   : false*
>
> *srun: unbuffered IO  : false*
>
> *srun: overcommit     : false*
>
> *srun: threads        : 60*
>
> *srun: checkpoint_dir : /var/slurm/checkpoint*
>
> *srun: wait           : 0*
>
> *srun: nice           : -2*
>
> *srun: account        : (null)*
>
> *srun: comment        : (null)*
>
> *srun: dependency     : (null)*
>
> *srun: exclusive      : false*
>
> *srun: bcast          : false*
>
> *srun: qos            : (null)*
>
> *srun: constraints    :*
>
> *srun: geometry       : (null)*
>
> *srun: reboot         : yes*
>
> *srun: rotate         : no*
>
> *srun: preserve_env   : false*
>
> *srun: network        : (null)*
>
> *srun: propagate      : NONE*
>
> *srun: prolog         : (null)*
>
> *srun: epilog         : (null)*
>
> *srun: mail_type      : NONE*
>
> *srun: mail_user      : (null)*
>
> *srun: task_prolog    : (null)*
>
> *srun: task_epilog    : (null)*
>
> *srun: multi_prog     : no*
>
> *srun: sockets-per-node  : -2*
>
> *srun: cores-per-socket  : -2*
>
> *srun: threads-per-core  : -2*
>
> *srun: ntasks-per-node   : -2*
>
> *srun: ntasks-per-socket : -2*
>
> *srun: ntasks-per-core   : -2*
>
> *srun: plane_size        : 4294967294*
>
> *srun: core-spec         : NA*
>
> *srun: power             :*
>
> *srun: remote command    : `hostname'*
>
> *srun: Waiting for nodes to boot (delay looping 450 times @ 0.100000 secs
> x index)*
>
> *srun: Nodes ####### are ready for job*
>
> *srun: jobid 50871: nodes(1):`#######', cpu counts: 64(x1)*
>
> *srun: launching 50871.0 on host #######, 1 tasks: 0*
>
> *srun: route default plugin loaded*
>
> *srun: error: timeout waiting for task launch, started 0 of 1 tasks*
>
> *srun: Job step 50871.0 aborted before step completely launched.*
>
> *srun: Job step aborted: Waiting up to 32 seconds for job step to finish.*
>
> *srun: error: Timed out waiting for job step to complete*
>
> *#*
>
>
>
> Rgds
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180717/82be5c5b/attachment-0001.html>


More information about the slurm-users mailing list