[slurm-users] ignore gpu resources to scheduled the cpu based jobs

Tue Jun 16 14:23:45 UTC 2020

Diego Zuccato <diego.zuccato at unibo.it> writes:

> Il 16/06/20 09:39, Loris Bennett ha scritto:
>
>>> Maybe it's already known and obvious, but... Remember that a node can be
>>> allocated to only one partition.
>> Maybe I am misunderstanding you, but I think that this is not the case.
>> A node can be in multiple partitions.
>
> *Assigned* to multiple partitions: OK.
> But once slurm schedules jon in "partGPU" on that node, the whole node
> is unavailable for jobs in "partCPU", even if the GPU job is using only
> 1% of the resources.

Thanks for pointing this out - I hadn't been aware of this.  Is there
anywhere in the documentation where this is explicitly stated?

>>  We have nodes belonging to
>> individual research groups which are in both a separate partition just
>> for the group and in a 'scavenger' partition for everyone (but with
>> lower priority add maximum run-time).
>
> More or less our current config. Quite inefficient, at least for us: too
> many unuseable resources due to small jobs.

Our scavenger partition tends to be used mostly by a small number of
users each with a huge number of small, short jobs.  Thus, they tend to
fill nodes and not block resources for that long, but I probably need to
look at this a bit more carefully.

>>> So, if you have the mixed nodes in bot
>>> partitions and there's a GPU job running, a non-gpu job will find that
>>> node marked as busy because it's allocated to another partition.
>>> That's why we're drastically reducing the number of partitions we have
>>> and will avoid shared nodes.

>> Again I don't this is explanation.  If a job is running on a GPU node,
>> but not using all the CPUs, then a CPU-only job should be able to start
>> on that node, unless some form of exclusivity has been set up, such as
>> ExclusiveUser=YES for the partition.

> Nope. The whole node gets allocated to one partition at a time. So if
> the GPU job and the CPU one are in different partitions, it's expected
> that only one starts. The behaviour you're looking for is the one of
> QoS: define a single partition w/ multiple QoS and both jobs will run
> concurrently.
>
> If you think about it, that's the meaning of "partition" :)

Like I said, this is new to me, but personally I don't think that
linguistically speaking it is obvious.  If the actual membership of a
node to a partition changes over time and just depends on which jobs
happen to be running on it at a given moment, to my mind, that's not
much like the physical concept of partitioning a room or a city.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de