[slurm-users] pam_slurm_adopt not working for all users

Juergen Salk juergen.salk at uni-ulm.de
Fri May 21 16:30:54 UTC 2021


* Tina Friedrich <tina.friedrich at it.ox.ac.uk> [210521 16:35]:

> If this is simply about quickly accessing nodes that they have jobs on to
> check on them - we tell our users to 'srun' into a job allocation (srun
> --jobid=XXXXXX).

Hi Tina,

sadly, this does not always work in version 20.11.x any more because of the
new non-overlapping default behaviour for job step allocations.

$ sbatch -n 1 --wrap="srun sleep 600"
Submitted batch job 2550804
$ squeue --me
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    2550804  standard     wrap user01   R       0:06      1 n0326

$ srun --jobid=2550804 --pty /bin/bash
srun: Job 2550804 step creation temporarily disabled, retrying (Requested nodes are busy)

(and hangs forever untig Ctrl-C'ed ...)

^Csrun: Cancelled pending job step with signal 2
srun: error: Unable to create step for job 2550804: Job/step already completing or completed
$

This now needs --overlap option for both, the job allocation itself and the
srun command that attaches the shell, in order to always work as before. 

Best regards
Jürgen





More information about the slurm-users mailing list