[slurm-users] Jobs cancelled due to job requeue
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Sat Sep 3 07:59:13 UTC 2022
On 02-09-2022 20:52, Nicolas Sonoda wrote:
> I'm submiting a job but after a few seconds it got cancelled and the
> Slurm output file show this message:
>
> slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT
> 2022-09-02T14:28:19 DUE TO JOB REQUEUE ***
>
> After this the job turn into PD state on queue, with the reason: BeginTime:
>
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
> 23884 gpu Memb.LS1 vhpc PD 0:00 1 (BeginTime)
>
> And after a while the job stay on RH state with JobHoldMaxRequeue reason.
>
> I'm attaching my script and input files.
>
> Can you help me with that?
You could look in the slurmctld.log file and the node's slurmd.log file
to see what they say about the job.
Check your slurm.conf requeue configuration:
$ scontrol show config | grep Requeue
/Ole
More information about the slurm-users
mailing list