[slurm-users] Jobs cancelled due to job requeue
Nicolas Sonoda
nicolas.sonoda at versatushpc.com.br
Fri Sep 2 18:52:52 UTC 2022
Hi!
I'm submiting a job but after a few seconds it got cancelled and the Slurm output file show this message:
slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT 2022-09-02T14:28:19 DUE TO JOB REQUEUE ***
After this the job turn into PD state on queue, with the reason: BeginTime:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
23884 gpu Memb.LS1 vhpc PD 0:00 1 (BeginTime)
And after a while the job stay on RH state with JobHoldMaxRequeue reason.
I'm attaching my script and input files.
Can you help me with that?
Thank you.
Nícolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MD_200_GPU.slurm
Type: application/octet-stream
Size: 165 bytes
Desc: MD_200_GPU.slurm
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: README
Type: application/octet-stream
Size: 2360 bytes
Desc: README
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220902/d8586dd2/attachment-0001.obj>
More information about the slurm-users
mailing list