[slurm-users] Problem with job allocation

Nicolas Sonoda nicolas.sonoda at versatushpc.com.br
Wed Mar 30 12:59:18 UTC 2022


I'm getting the following error with prolog when I try to alocate more then 2 nodes with Sbatch:

[2022-03-28T07:40:17.016] backfill: Started JobId=19825 in intel_large on n[01-05]
[2022-03-28T07:45:17.310] _run_prolog: timeout after 300s: killing pgid 45004
[2022-03-28T07:45:17.310] error: prolog_slurmctld JobId=19825 prolog exit status 0:9

I have this configuration for my queue:

PartitionName=intel_large Nodes=n[01-10] Default=NO MaxTime=72:00:00 MaxNodes=5 OverSubscribe=EXCLUSIVE State=UP

And I'm attaching my slurmctld.prolog

Can you help me with that?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220330/aa9cc5e6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurmctld.prolog
Type: application/octet-stream
Size: 950 bytes
Desc: slurmctld.prolog
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220330/aa9cc5e6/attachment.obj>

More information about the slurm-users mailing list