<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Michael,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Indeed I had the older scheduler loaded and not the backfill. I have updated the configuration and will see if the scheduler will pick up the pending jobs.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Thanks</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Cristiano</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of slurm-users-request@lists.schedmd.com <slurm-users-request@lists.schedmd.com><br>
<b>Sent:</b> Wednesday, August 2, 2023 4:15 PM<br>
<b>To:</b> slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> slurm-users Digest, Vol 70, Issue 3</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">[You don't often get email from slurm-users-request@lists.schedmd.com. Learn why this is important at
<a href="https://aka.ms/LearnAboutSenderIdentification">https://aka.ms/LearnAboutSenderIdentification</a> ]<br>
<br>
Send slurm-users mailing list submissions to<br>
slurm-users@lists.schedmd.com<br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users">
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LtVh8aZ9q7GcEmhOB158TaIlQjll5OI3XOe9rcglrq8%3D&reserved=0</a><br>
or, via email, send a message with subject or body 'help' to<br>
slurm-users-request@lists.schedmd.com<br>
<br>
You can reach the person managing the list at<br>
slurm-users-owner@lists.schedmd.com<br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of slurm-users digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Job in "priority" status - resources available (Cumer Cristiano)<br>
2. Re: Job in "priority" status - resources available<br>
(Michael Gutteridge)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 2 Aug 2023 12:09:52 +0000<br>
From: Cumer Cristiano <CristianoMaria.Cumer@unibz.it><br>
To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com><br>
Subject: [slurm-users] Job in "priority" status - resources available<br>
Message-ID:<br>
<PAVPR07MB91916B49909E972995CE0806E10BA@PAVPR07MB9191.eurprd07.prod.outlook.com><br>
<br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Hello,<br>
<br>
I'm quite a newbie regarding Slurm. I recently created a small Slurm instance to manage our GPU resources. I have this situation:<br>
<br>
JOBID STATE TIME ACCOUNT PARTITION PRIORITY REASON CPU MIN_MEM TRES_PER_NODE<br>
1739 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100_1g.10gb:1<br>
1738 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100-sxm4-80gb:1<br>
1737 PENDING 0:00 standard gpu-low 5 Priority 1 80G gres:gpu:a100-sxm4-80gb:1<br>
1736 PENDING 0:00 standard gpu-low 5 Resources 1 80G gres:gpu:a100-sxm4-80gb:1<br>
1740 PENDING 0:00 standard gpu-low 1 Priority 1 8G gres:gpu:a100_3g.39gb<br>
1735 PENDING 0:00 standard gpu-low 1 Priority 8 64G gres:gpu:a100-sxm4-80gb:1<br>
1596 RUNNING 1-13:26:45 standard gpu-low 3 None 2 64G gres:gpu:a100_1g.10gb:1<br>
1653 RUNNING 21:09:52 standard gpu-low 2 None 1 16G gres:gpu:1<br>
1734 RUNNING 59:52 standard gpu-low 1 None 8 64G gres:gpu:a100-sxm4-80gb:1<br>
1733 RUNNING 1:01:54 standard gpu-low 1 None 8 64G gres:gpu:a100-sxm4-80gb:1<br>
1732 RUNNING 1:02:39 standard gpu-low 1 None 8 40G gres:gpu:a100-sxm4-80gb:1<br>
1731 RUNNING 1:08:28 standard gpu-low 1 None 8 40G gres:gpu:a100-sxm4-80gb:1<br>
1718 RUNNING 10:16:40 standard gpu-low 1 None 2 8G gres:gpu:v100<br>
1630 RUNNING 1-00:21:21 standard gpu-low 1 None 1 30G gres:gpu:a100_3g.39gb<br>
1610 RUNNING 1-09:53:23 standard gpu-low 1 None 2 8G gres:gpu:v100<br>
<br>
<br>
<br>
Job 1736 is in the PENDING state since there are no more available a100-sxm4-80gb GPUs. The job priority starts to rise with time (priority 5) as expected. Now another user submits job 1739 on a gres:gpu:a100_1g.10gb:1 that is available, but the job is not
starting since its priority is 1. This is obviously not the desired outcome, and I believe I must change the scheduling strategy. Could someone with more experience than me give me some hints?<br>
<br>
Thanks, Cristiano<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.schedmd.com/pipermail/slurm-users/attachments/20230802/27400545/attachment-0001.htm">https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20230802%2F27400545%2Fattachment-0001.htm&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LwbGVTucT%2B01WhlWicMYUqss%2FRRZMCLHlGMfOsTAckg%3D&reserved=0</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Wed, 2 Aug 2023 07:15:06 -0700<br>
From: Michael Gutteridge <michael.gutteridge@gmail.com><br>
To: Slurm User Community List <slurm-users@lists.schedmd.com><br>
Subject: Re: [slurm-users] Job in "priority" status - resources<br>
available<br>
Message-ID:<br>
<CALUL84uJ7yc7H_eb7c1vaHHdoyTRPB5FHz35u8z24mmzWGCFwA@mail.gmail.com><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
I'm not sure there's enough information in your message- Slurm version and<br>
configs are often necessary to make a more confident diagnosis. However,<br>
the behaviour you are looking for (lower priority jobs skipping the line)<br>
is called "backfill". There's docs here:<br>
<a href="https://slurm.schedmd.com/sched_config.html#backfill">https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsched_config.html%23backfill&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Bh%2FcyWGU3CyZwhR8igsrytnV8fE5B7RpYFzEzwXapY%3D&reserved=0</a><br>
<br>
It should be loaded and active by default which is why I'm not super<br>
confident here. There may also be something else going on with the node<br>
configuration as it looks like 1596 would maybe need the same node? Maybe<br>
there's not enough CPU or memory to accommodate both jobs (1596 and 1739)?<br>
<br>
HTH<br>
- Michael<br>
<br>
On Wed, Aug 2, 2023 at 5:13?AM Cumer Cristiano <<br>
CristianoMaria.Cumer@unibz.it> wrote:<br>
<br>
> Hello,<br>
><br>
> I'm quite a newbie regarding Slurm. I recently created a small Slurm<br>
> instance to manage our GPU resources. I have this situation:<br>
><br>
> JOBID STATE TIME ACCOUNT PARTITION PRIORITY<br>
> REASON CPU MIN_MEM TRES_PER_NODE<br>
> 1739 PENDING 0:00 standard gpu-low 5<br>
> Priority 1 80G gres:gpu:a100_1g.10gb:1<br>
> 1738 PENDING 0:00 standard gpu-low 5<br>
> Priority 1 80G gres:gpu:a100-sxm4-80gb:1<br>
> 1737 PENDING 0:00 standard gpu-low 5<br>
> Priority 1 80G gres:gpu:a100-sxm4-80gb:1<br>
> 1736 PENDING 0:00 standard gpu-low 5<br>
> Resources 1 80G gres:gpu:a100-sxm4-80gb:1<br>
> 1740 PENDING 0:00 standard gpu-low 1<br>
> Priority 1 8G gres:gpu:a100_3g.39gb<br>
> 1735 PENDING 0:00 standard gpu-low 1<br>
> Priority 8 64G gres:gpu:a100-sxm4-80gb:1<br>
> 1596 RUNNING 1-13:26:45 standard gpu-low 3<br>
> None 2 64G gres:gpu:a100_1g.10gb:1<br>
> 1653 RUNNING 21:09:52 standard gpu-low 2<br>
> None 1 16G gres:gpu:1<br>
> 1734 RUNNING 59:52 standard gpu-low 1<br>
> None 8 64G gres:gpu:a100-sxm4-80gb:1<br>
> 1733 RUNNING 1:01:54 standard gpu-low 1<br>
> None 8 64G gres:gpu:a100-sxm4-80gb:1<br>
> 1732 RUNNING 1:02:39 standard gpu-low 1<br>
> None 8 40G gres:gpu:a100-sxm4-80gb:1<br>
> 1731 RUNNING 1:08:28 standard gpu-low 1<br>
> None 8 40G gres:gpu:a100-sxm4-80gb:1<br>
> 1718 RUNNING 10:16:40 standard gpu-low 1<br>
> None 2 8G gres:gpu:v100<br>
> 1630 RUNNING 1-00:21:21 standard gpu-low 1<br>
> None 1 30G gres:gpu:a100_3g.39gb<br>
> 1610 RUNNING 1-09:53:23 standard gpu-low 1<br>
> None 2 8G gres:gpu:v100<br>
><br>
><br>
> Job 1736 is in the PENDING state since there are no more available<br>
> a100-sxm4-80gb GPUs. The job priority starts to rise with time (priority 5)<br>
> as expected. Now another user submits job 1739 on a gres:gpu:a100_1g.10gb:1<br>
> that is available, but the job is not starting since its priority is 1.<br>
> This is obviously not the desired outcome, and I believe I must change the<br>
> scheduling strategy. Could someone with more experience than me give me<br>
> some hints?<br>
><br>
> Thanks, Cristiano<br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.schedmd.com/pipermail/slurm-users/attachments/20230802/0e4837c3/attachment.htm">https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20230802%2F0e4837c3%2Fattachment.htm&data=05%7C01%7CCristianoMaria.Cumer%40unibz.it%7C5c0379db010c4a4a747908db936311f0%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638265825947787326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nrc4A9AOAkkjSY9t5HNWsx%2BGfH4Gjl%2Fe9jaZ8sUiupQ%3D&reserved=0</a>><br>
<br>
End of slurm-users Digest, Vol 70, Issue 3<br>
******************************************<br>
</div>
</span></font></div>
</body>
</html>