[slurm-users] problem running slurm

Brian Andrus toomuchit at gmail.com
Fri Feb 7 17:05:23 UTC 2020


Your trying to run bash which, without special configuration, needs a pty

Try

srun -v -p debug --pty bash

Brian Andrus

On 2/6/2020 10:28 PM, Hector Yuen wrote:
> Hello,
>
> I am setting up a very simple configuration: one node running slurmd 
> and another one running slurmctld.
>
> In the slurmctld machine I run:
>
> srun -v -p debug bash -i
>
>
> And get this output
> srun: defined options
> srun: -------------------- --------------------
> srun: partition           : debug
> srun: verbose             : 1
> srun: -------------------- --------------------
> srun: end of defined options
> srun: jobid 41: nodes(1):`test116', cpu counts: 1(x1)
> srun: CpuBindType=(null type)
> srun: launching 41.0 on host test116, 1 tasks: 0
> srun: route default plugin loaded
> srun: error: task 0 launch failed: Slurmd could not set up environment 
> for batch job
> srun: Node test116, 1 tasks started
>
> Enabled debug logging in slurmd.
>
> slurmd: debug3: in the service_connection
> slurmd: debug2: Start processing RPC: REQUEST_LAUNCH_TASKS
> slurmd: debug2: Processing RPC: REQUEST_LAUNCH_TASKS
> slurmd: launch task 45.0 request from UID:1000 GID:1000 
> HOST:169.254.1.32 PORT:2300
> slurmd: debug3: state for jobid 42: ctime:1581056522 
> revoked:1581056522 expires:1581056642
> slurmd: debug3: state for jobid 43: ctime:1581056533 
> revoked:1581056533 expires:1581056653
> slurmd: debug3: state for jobid 44: ctime:1581056623 
> revoked:1581056623 expires:1581056743
> slurmd: debug:  Checking credential with 384 bytes of sig data
> slurmd: debug:  task affinity : before lllp distribution cpu bind 
> method is '(null type)' ((null))
> slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1
> slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1
> slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node: 
> 0x00000000000000000001
> slurmd: debug3: _lllp_map_abstract_masks
> slurmd: debug:  binding tasks:1 to nodes:1 sockets:1:0 cores:1:0 threads:1
> slurmd: lllp_distribution jobid [45] implicit auto binding: 
> sockets,one_thread, dist 8192
> slurmd: _task_layout_lllp_cyclic
> slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1
> slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1
> slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node: 
> 0x00000000000000000001
> slurmd: debug3: _task_layout_display_masks jobid [45:0] 
> 0x00000000000000000001
> slurmd: debug3: _lllp_map_abstract_masks
> slurmd: debug3: _task_layout_display_masks jobid [45:0] 
> 0x00000000000000000001
> slurmd: debug3: _lllp_generate_cpu_bind 1 23 24
> slurmd: _lllp_generate_cpu_bind jobid [45]: mask_cpu,one_thread, 
> 0x00000000000000000001
> slurmd: debug:  task affinity : after lllp distribution cpu bind 
> method is 'mask_cpu,one_thread' (0x00000000000000000001)
> slurmd: debug2: _insert_job_state: we already have a job state for job 
> 45.  No big deal, just an FYI.
> slurmd: _run_prolog: run job script took usec=4
> slurmd: _run_prolog: prolog with lock for job 45 ran for 0 seconds
> slurmd: debug3: _rpc_launch_tasks: call to _forkexec_slurmstepd
> slurmd: debug3: slurmstepd rank 0 (test116), parent rank -1 (NONE), 
> children 0, depth 0, max_depth 0
> slurmd: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd
> slurmd: debug:  task_p_slurmd_reserve_resources: 45
> slurmd: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS
> slurmd: debug3: in the service_connection
> slurmd: debug2: Start processing RPC: REQUEST_TERMINATE_JOB
> slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
> slurmd: debug:  _rpc_terminate_job, uid = 1000
> slurmd: debug:  task_p_slurmd_release_resources: affinity jobid 45
> slurmd: debug:  credential for job 45 revoked
> slurmd: debug2: No steps in jobid 45 to send signal 18
> slurmd: debug2: No steps in jobid 45 to send signal 15
> slurmd: debug4: sent ALREADY_COMPLETE
> slurmd: debug2: set revoke expiration for jobid 45 to 1581056754 UTS
> slurmd: debug2: Finish processing RPC: REQUEST_TERMINATE_JOB
>
>
> Any ideas what could be going wrong here?
>
> Thanks
> -- 
> -h
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200207/1d36dccf/attachment.htm>


More information about the slurm-users mailing list