[slurm-users] CPU binding outside of job step allocation
Rutledge, Chris
crutledge at renci.org
Fri Jun 10 13:48:47 UTC 2022
Hello Everyone,
Having an odd issue with the latest version of slurm (22.05.0) when submitting jobs to the queue while on a compute resource. Some jobs are unable to reproduce this issue every time, but I've got a few that will. Here's one case that consistently errors when trying to launch. I've not been able to reproduce the issue when submitting jobs from the login node.
Anyone seen anything like this?
##############################
# start interactive session
##############################
[crutledge at ht1 ~]$ /usr/bin/srun --pty /bin/bash -i -l
[crutledge at largemem-5-1 ~]$ cd hpcc/bin/gpu-6/
##############################
# job details
##############################
[crutledge at largemem-5-1 gpu-6]$ cat job
#!/bin/bash -l
#
#SBATCH --job-name=HPCC
#SBATCH -n 48
#SBATCH -p gpu
#SBATCH --mem-per-cpu=3975
module load icc/2022.0.2 env_icc/any mvapich2/2.3.7-intel
srun ./hpcc
mv hpccoutf.txt hpccoutf.txt.${SLURM_JOB_ID}
##############################
# submit the job
##############################
[crutledge at largemem-5-1 gpu-6]$ sbatch job
Submitted batch job 8533
##############################
# resulting error
##############################
[crutledge at largemem-5-1 gpu-6]$ cat slurm-8533.out
Loading icc version 2022.0.2
Loading compiler-rt version 2022.0.2
srun: error: CPU binding outside of job step allocation, allocated CPUs are: 0x000000000001000000000001.
srun: error: Task launch for StepId=8533.0 failed on node gpu-5-2: Unable to satisfy cpu bind request
srun: error: Application launch failed: Unable to satisfy cpu bind request
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 8533.0 ON gpu-5-1 CANCELLED AT 2022-06-10T09:38:19 ***
srun: error: gpu-5-1: tasks 0-46: Killed
mv: cannot stat ‘hpccoutf.txt’: No such file or directory
[crutledge at largemem-5-1 gpu-6]$
More information about the slurm-users
mailing list