[slurm-users] pam_slurm_adopt not working for all users
Juergen Salk
juergen.salk at uni-ulm.de
Fri May 21 16:30:54 UTC 2021
* Tina Friedrich <tina.friedrich at it.ox.ac.uk> [210521 16:35]:
> If this is simply about quickly accessing nodes that they have jobs on to
> check on them - we tell our users to 'srun' into a job allocation (srun
> --jobid=XXXXXX).
Hi Tina,
sadly, this does not always work in version 20.11.x any more because of the
new non-overlapping default behaviour for job step allocations.
$ sbatch -n 1 --wrap="srun sleep 600"
Submitted batch job 2550804
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2550804 standard wrap user01 R 0:06 1 n0326
$ srun --jobid=2550804 --pty /bin/bash
srun: Job 2550804 step creation temporarily disabled, retrying (Requested nodes are busy)
(and hangs forever untig Ctrl-C'ed ...)
^Csrun: Cancelled pending job step with signal 2
srun: error: Unable to create step for job 2550804: Job/step already completing or completed
$
This now needs --overlap option for both, the job allocation itself and the
srun command that attaches the shell, in order to always work as before.
Best regards
Jürgen
More information about the slurm-users
mailing list