[slurm-users] pam_slurm_adopt not working for all users
juergen.salk at uni-ulm.de
Fri May 21 16:30:54 UTC 2021
* Tina Friedrich <tina.friedrich at it.ox.ac.uk> [210521 16:35]:
> If this is simply about quickly accessing nodes that they have jobs on to
> check on them - we tell our users to 'srun' into a job allocation (srun
sadly, this does not always work in version 20.11.x any more because of the
new non-overlapping default behaviour for job step allocations.
$ sbatch -n 1 --wrap="srun sleep 600"
Submitted batch job 2550804
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2550804 standard wrap user01 R 0:06 1 n0326
$ srun --jobid=2550804 --pty /bin/bash
srun: Job 2550804 step creation temporarily disabled, retrying (Requested nodes are busy)
(and hangs forever untig Ctrl-C'ed ...)
^Csrun: Cancelled pending job step with signal 2
srun: error: Unable to create step for job 2550804: Job/step already completing or completed
This now needs --overlap option for both, the job allocation itself and the
srun command that attaches the shell, in order to always work as before.
More information about the slurm-users