[slurm-users] Limit number of specific concurrent jobs per node

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Mon May 7 17:17:11 MDT 2018


Hi Andreas,

You could define a generic consumable resource per node and have the scheduling take account of requests for it. In principle, you could do this for say interface_bandwidth or io_bw and try and use real numbers, but in practice users don't know how much they need and will use and admins don't have capability to set strong limits anyway. As such, you may as well schedule a more abstract resource - let's call it 'eins'.

Define one 'eins' gres per node and have the jobs you want separated request gres=eins:1 The jobs will then run on separate nodes. Jobs that don't request gres=eins will not be separated.

If you want a bit more flexibility try a resource with a larger count per node (a different resource name, say 'vier', with 4 per node). Jobs could then request gres=vier:1 (up to 4 will run on a node), gres=vier:2 (only 2 per node) and so on (but not gres=vier:5!).

Maybe name the resource 'iocount' and expect heavy io users to only request 1. Then you can tune what you make available on nodes later, without requiring the users to change behaviour.

You could combine this with extra partitions and/or a filter to set defaults and make the choice/usage easier for your users.

Gareth 

-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Andreas Hilboll
Sent: Tuesday, 8 May 2018 2:58 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Limit number of specific concurrent jobs per node

Dear SLURM experts,

we have a cluster of 56 nodes with 28 cores each.  Is it possible to limit the number of jobs of a certain name which concurrently run on one node, without blocking the node for other jobs?

For example, when I do

   for filename in runtimes/*/jobscript.sh; do
     sbatch -J iojob -n 1 $filename
   done

How can I assure that only one of these jobs runs per node?  The jobs are very lightweight computationally and only use 1 core each, but since they are rather heavy on the I/O side, I'd like to ensure that when a job runs, it doesn't have to share the available I/O bandwidth with other jobs.  (This would actually work since usually our other jobs are not I/O intensive.)

>From reading the manpage, I couldn't figure out how to do this.


Sunny greetings,
 Andreas




More information about the slurm-users mailing list