[slurm-users] Compact scheduling strategy for small GPU jobs

Jack Chen scsvip at gmail.com
Tue Aug 10 16:19:51 UTC 2021


Thanks for your reply! It's certain that slurm will not place small jobs on
same node if resources are not available. But I'm using default values in
my issue, job cmd is : srun -n 1 --cpus-per-task=2 --gres=gpu:1 'sleep
12000'.

When I submit another 8  one gpu jobs, they can run both on node A and B.
So I believe we can exclude resource reasons.

Slurm version >= 17 supports gpus parameters, it helps run jobs when
resource fragments occur. But it would be great help if slurms support
compact scheduling strategy to run these small GPU jobs on one node to
avoid resource fragments occurring.

Later I will setup slurm newest versions and test the above test case.
There are thousands of machines in my cluster, users want to submit
hundreds of small jobs, so fragments are really annoying.

PS: I replied above to Diego, forget to reply all. (:


On Tue, Aug 10, 2021 at 11:44 PM Brian Andrus <toomuchit at gmail.com> wrote:

> You may want to look at your resources. If the memory allocation adds up
> such that there isn't enough left for any job to run, it won't matter that
> there are still GPUs available.
>
> Similar for any other resource (CPUs, cores, etc)
>
> Brian Andrus
>
>
> On 8/10/2021 8:07 AM, Jack Chen wrote:
>
> Does anyone have any ideas on this?
>
> On Fri, Aug 6, 2021 at 2:52 PM Jack Chen <scsvip at gmail.com> wrote:
>
>> I'm using slurm15.08.11, when I submit several 1 gpu jobs, slurm doesn't
>> allocate nodes using compact strategy. Anyone know how to solve this? Will
>> upgrading slurm latest version help ?
>>
>> For example, there are two nodes A and B with 8 gpus per node, I
>> submitted 8 1 gpu jobs, slurm will allocate first 6 jobs on node A, then
>> last 2 jobs on node B. Then when I submit one job with 8 gpus, it will
>> pending because of gpu fragments: nodes A has 2 idle gpus, node b 6 idle
>> gpus
>>
>> Thanks in advance!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210811/8a01e70a/attachment.htm>


More information about the slurm-users mailing list