[slurm-users] How to avoid a feature?

Fri Jul 2 15:09:01 UTC 2021

:) That was the first thing we tried/did - however, that only works if 
you're cluster isn't habitually 100% busy with jobs waiting. So that 
didn't work very well - even with the weighting set up so that the GPU 
were 'last resort' (after all the special high memory  nodes), they were 
always running CPU jobs.

(And I did read a lot of the 'how can we reserve X amount of cores for 
GPU work' threads I could find, but none of it seemed to be very 
straight forward - and hey, given that they're also always using all 
GPUs, I don't think we're wasting resources much in this setup.)

Tina

On 02/07/2021 15:44, Jeffrey R. Lang wrote:
> How about using node weights.    Weight the non-gpu nodes so that they are scheduled first.  The GPU nodes could have a very high weight so that the scheduler would consider them last for allocation. This would allow the non-gpu nodes to be filled first and when full schedule the GPU nodes.   User needing a GPU could just include a feature request which should allocate the GPU nodes as necessary.
> 
> Jeff
> 
> 
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Loris Bennett
> Sent: Friday, July 2, 2021 12:48 AM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] How to avoid a feature?
> 
> ◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
> 
> 
> Hi Tina,
> 
> Tina Friedrich <tina.friedrich at it.ox.ac.uk> writes:
> 
>> Hi Brian,
>>
>> sometimes it would be nice if SLURM had what Grid Engine calls a 'forced
>> complex' (i.e. a feature that you *have* to request to land on a node that has
>> it), wouldn't it?
>>
>> I do something like that for all of my 'special' nodes (GPU, KNL, nodes...) - I
>> want to avoid jobs not requesting that resource or allowing that architecture
>> landing on it. I 'tag' all nodes with a relevant feature (cpu, gpu, knl, ...),
>> and have a LUA submit verifier that checks for a 'relevant' feature (or a
>> --gres=gpu or somthing) and if there isn't one I add the 'cpu' feature to the
>> request.
>>
>> Works for us!
> 
> We just have the GPU nodes in a separate partition 'gpu' which users
> have to specify if they want a GPU.  How does that approach differ from
> yours in terms of functionality for you (or the users)?
> 
> The main problem with our approach is that the CPUs on the GPU nodes can
> remain idle while there is a queue for the regular CPU nodes.  What I
> would like is to allow short CPU-only jobs to run on the GPUs but only
> allow GPU-jobs to run for longer, which I guess I could probably do
> within the submit plugin.
> 
> Cheers,
> 
> Loris
> 
> 
>> Tina
>>
>> On 01/07/2021 15:08, Brian Andrus wrote:
>>> All,
>>>
>>> I have a partition where one of the nodes has a node-locked license.
>>> That license is not used by everyone that uses the partition.
>>> They are cloud nodes, so weights do not work (there is an open bug about
>>> that).
>>>
>>> I need to have jobs 'avoid' that node by default. I am thinking I can use a
>>> feature constraint, but that seems to only apply to those that want the
>>> feature. Since we have so many other users, it isn't feasible to have them
>>> modify their scripts, so having it avoid by default would work.
>>>
>>> Any ideas how to do that? Submit LUA perhaps?
>>>
>>> Brian Andrus
>>>
>>>
> --
> Dr. Loris Bennett (Hr./Mr.)
> ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de
> 

-- 
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk