[slurm-users] Jobs cancelled due to job requeue

Nicolas Sonoda nicolas.sonoda at versatushpc.com.br
Fri Sep 2 18:52:52 UTC 2022


Hi!

I'm submiting a job but after a few seconds it got cancelled and the Slurm output file show this message:

slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT 2022-09-02T14:28:19 DUE TO JOB REQUEUE ***

After this the job turn into PD state on queue, with the reason: BeginTime:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
23884       gpu Memb.LS1     vhpc PD       0:00      1 (BeginTime)

And after a while the job stay on RH state with JobHoldMaxRequeue reason.

I'm attaching my script and input files.

Can you help me with that?

Thank you.
Nícolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MD_200_GPU.slurm
Type: application/octet-stream
Size: 165 bytes
Desc: MD_200_GPU.slurm
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: README
Type: application/octet-stream
Size: 2360 bytes
Desc: README
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment-0001.obj>


More information about the slurm-users mailing list