[slurm-users] How to avoid a feature?
Jeffrey R. Lang
JRLang at uwyo.edu
Fri Jul 2 14:44:19 UTC 2021
How about using node weights. Weight the non-gpu nodes so that they are scheduled first. The GPU nodes could have a very high weight so that the scheduler would consider them last for allocation. This would allow the non-gpu nodes to be filled first and when full schedule the GPU nodes. User needing a GPU could just include a feature request which should allocate the GPU nodes as necessary.
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Loris Bennett
Sent: Friday, July 2, 2021 12:48 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] How to avoid a feature?
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Tina Friedrich <tina.friedrich at it.ox.ac.uk> writes:
> Hi Brian,
> sometimes it would be nice if SLURM had what Grid Engine calls a 'forced
> complex' (i.e. a feature that you *have* to request to land on a node that has
> it), wouldn't it?
> I do something like that for all of my 'special' nodes (GPU, KNL, nodes...) - I
> want to avoid jobs not requesting that resource or allowing that architecture
> landing on it. I 'tag' all nodes with a relevant feature (cpu, gpu, knl, ...),
> and have a LUA submit verifier that checks for a 'relevant' feature (or a
> --gres=gpu or somthing) and if there isn't one I add the 'cpu' feature to the
> Works for us!
We just have the GPU nodes in a separate partition 'gpu' which users
have to specify if they want a GPU. How does that approach differ from
yours in terms of functionality for you (or the users)?
The main problem with our approach is that the CPUs on the GPU nodes can
remain idle while there is a queue for the regular CPU nodes. What I
would like is to allow short CPU-only jobs to run on the GPUs but only
allow GPU-jobs to run for longer, which I guess I could probably do
within the submit plugin.
> On 01/07/2021 15:08, Brian Andrus wrote:
>> I have a partition where one of the nodes has a node-locked license.
>> That license is not used by everyone that uses the partition.
>> They are cloud nodes, so weights do not work (there is an open bug about
>> I need to have jobs 'avoid' that node by default. I am thinking I can use a
>> feature constraint, but that seems to only apply to those that want the
>> feature. Since we have so many other users, it isn't feasible to have them
>> modify their scripts, so having it avoid by default would work.
>> Any ideas how to do that? Submit LUA perhaps?
>> Brian Andrus
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users