[slurm-users] Job preempts entire host instead of single job
Michał Kadlof
michal.kadlof at pw.edu.pl
Tue Jan 17 12:04:03 UTC 2023
Hi,
I struggle with configuring job preempting. I have nodes with 8 Nvidia
A100 GPUs. I have two partitions: short (lower priority) and sfglab
(higher priority). I want to allow higher priority jobs to preempt
(REQUEUE mode) lower priority job. It looks like it works, however it
works too good.
Job from higher priority partition preempts entire host instead of only
single job which would be enough to release resources for higher
priority partition. Whats more it lock the rest of resources until
high-prio job will end. What am I doing wrong?
Here is example:
$ srun --test-only -G1 -c1 --mem 1M -p sfglab
srun: Job 501151 to start at 2023-01-17T12:46:01 using 1 processors on
nodes dgx-1 in partition sfglab
srun: Preempts: 363278,501001,501029,501075,501076,501077,501120,501121
To release these resources it would be enough to preempt one job instead
of all.
Here is my config:
slurm.conf
(...)
DefMemPerCPU = 100
JobAcctGatherFrequency = 30
JobAcctGatherType = jobacct_gather/linux
PreemptMode = REQUEUE
PreemptType = preempt/partition_prio
PreemptExemptTime = 00:00:00
SelectType = select/cons_tres
SelectTypeParameters = CR_CORE_MEMORY
(...)
PartitionName=short Nodes=dgx-[1-4],sr-[1-3] MaxTime=1-0 State=UP
PriorityTier=10000 Default=YES DefaultTime=0-01:00:00 OverSubscribe=NO
PreemptMode=requeue
PartitionName=sfglab Nodes=dgx-1 MaxTime=10-0 State=UP
PriorityTier=20000 PreemptMode=off OverSubscribe=NO AllowAccounts=sfglab
--
best regards | pozdrawiam serdecznie
*Michał Kadlof*
Head of the high performance computing center
Eden^N cluster administrator
Faculty of Mathematics and Computer Science
Warsaw University of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230117/c808a37b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4788 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230117/c808a37b/attachment.bin>
More information about the slurm-users
mailing list