[slurm-users] Job in "priority" status - resources available
Cumer Cristiano
CristianoMaria.Cumer at unibz.it
Wed Aug 2 12:09:52 UTC 2023
Hello,
I'm quite a newbie regarding Slurm. I recently created a small Slurm instance to manage our GPU resources. I have this situation:
JOBID STATE TIME ACCOUNT PARTITION PRIORITY REASON CPU MIN_MEM TRES_PER_NODE
1739 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100_1g.10gb:1
1738 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100-sxm4-80gb:1
1737 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100-sxm4-80gb:1
1736 PENDING 0:00 standard gpu-low 5 Resources 1 80G gres:gpu:a100-sxm4-80gb:1
1740 PENDING 0:00 standard gpu-low 1 Priority 1 8G gres:gpu:a100_3g.39gb
1735 PENDING 0:00 standard gpu-low 1 Priority 8 64G gres:gpu:a100-sxm4-80gb:1
1596 RUNNING 1-13:26:45 standard gpu-low 3 None 2 64G gres:gpu:a100_1g.10gb:1
1653 RUNNING 21:09:52 standard gpu-low 2 None 1 16G gres:gpu:1
1734 RUNNING 59:52 standard gpu-low 1 None 8 64G gres:gpu:a100-sxm4-80gb:1
1733 RUNNING 1:01:54 standard gpu-low 1 None 8 64G gres:gpu:a100-sxm4-80gb:1
1732 RUNNING 1:02:39 standard gpu-low 1 None 8 40G gres:gpu:a100-sxm4-80gb:1
1731 RUNNING 1:08:28 standard gpu-low 1 None 8 40G gres:gpu:a100-sxm4-80gb:1
1718 RUNNING 10:16:40 standard gpu-low 1 None 2 8G gres:gpu:v100
1630 RUNNING 1-00:21:21 standard gpu-low 1 None 1 30G gres:gpu:a100_3g.39gb
1610 RUNNING 1-09:53:23 standard gpu-low 1 None 2 8G gres:gpu:v100
Job 1736 is in the PENDING state since there are no more available a100-sxm4-80gb GPUs. The job priority starts to rise with time (priority 5) as expected. Now another user submits job 1739 on a gres:gpu:a100_1g.10gb:1 that is available, but the job is not starting since its priority is 1. This is obviously not the desired outcome, and I believe I must change the scheduling strategy. Could someone with more experience than me give me some hints?
Thanks, Cristiano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230802/27400545/attachment.htm>
More information about the slurm-users
mailing list