[slurm-users] problem running slurm

Hector Yuen hector.yuen at gmail.com
Fri Feb 7 06:28:30 UTC 2020


Hello,

I am setting up a very simple configuration: one node running slurmd and
another one running slurmctld.

In the slurmctld machine I run:

srun -v -p debug bash -i


And get this output
srun: defined options
srun: -------------------- --------------------
srun: partition           : debug
srun: verbose             : 1
srun: -------------------- --------------------
srun: end of defined options
srun: jobid 41: nodes(1):`test116', cpu counts: 1(x1)
srun: CpuBindType=(null type)
srun: launching 41.0 on host test116, 1 tasks: 0
srun: route default plugin loaded
srun: error: task 0 launch failed: Slurmd could not set up environment for
batch job
srun: Node test116, 1 tasks started

Enabled debug logging in slurmd.

slurmd: debug3: in the service_connection
slurmd: debug2: Start processing RPC: REQUEST_LAUNCH_TASKS
slurmd: debug2: Processing RPC: REQUEST_LAUNCH_TASKS
slurmd: launch task 45.0 request from UID:1000 GID:1000 HOST:169.254.1.32
PORT:2300
slurmd: debug3: state for jobid 42: ctime:1581056522 revoked:1581056522
expires:1581056642
slurmd: debug3: state for jobid 43: ctime:1581056533 revoked:1581056533
expires:1581056653
slurmd: debug3: state for jobid 44: ctime:1581056623 revoked:1581056623
expires:1581056743
slurmd: debug:  Checking credential with 384 bytes of sig data
slurmd: debug:  task affinity : before lllp distribution cpu bind method is
'(null type)' ((null))
slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1
slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1
slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node:
0x00000000000000000001
slurmd: debug3: _lllp_map_abstract_masks
slurmd: debug:  binding tasks:1 to nodes:1 sockets:1:0 cores:1:0 threads:1
slurmd: lllp_distribution jobid [45] implicit auto binding:
sockets,one_thread, dist 8192
slurmd: _task_layout_lllp_cyclic
slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1
slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1
slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node:
0x00000000000000000001
slurmd: debug3: _task_layout_display_masks jobid [45:0]
0x00000000000000000001
slurmd: debug3: _lllp_map_abstract_masks
slurmd: debug3: _task_layout_display_masks jobid [45:0]
0x00000000000000000001
slurmd: debug3: _lllp_generate_cpu_bind 1 23 24
slurmd: _lllp_generate_cpu_bind jobid [45]: mask_cpu,one_thread,
0x00000000000000000001
slurmd: debug:  task affinity : after lllp distribution cpu bind method is
'mask_cpu,one_thread' (0x00000000000000000001)
slurmd: debug2: _insert_job_state: we already have a job state for job 45.
No big deal, just an FYI.
slurmd: _run_prolog: run job script took usec=4
slurmd: _run_prolog: prolog with lock for job 45 ran for 0 seconds
slurmd: debug3: _rpc_launch_tasks: call to _forkexec_slurmstepd
slurmd: debug3: slurmstepd rank 0 (test116), parent rank -1 (NONE),
children 0, depth 0, max_depth 0
slurmd: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd
slurmd: debug:  task_p_slurmd_reserve_resources: 45
slurmd: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS
slurmd: debug3: in the service_connection
slurmd: debug2: Start processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug:  _rpc_terminate_job, uid = 1000
slurmd: debug:  task_p_slurmd_release_resources: affinity jobid 45
slurmd: debug:  credential for job 45 revoked
slurmd: debug2: No steps in jobid 45 to send signal 18
slurmd: debug2: No steps in jobid 45 to send signal 15
slurmd: debug4: sent ALREADY_COMPLETE
slurmd: debug2: set revoke expiration for jobid 45 to 1581056754 UTS
slurmd: debug2: Finish processing RPC: REQUEST_TERMINATE_JOB


Any ideas what could be going wrong here?

Thanks
-- 
-h
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200206/bf3c53bf/attachment.htm>


More information about the slurm-users mailing list