[slurm-users] Limit number of jobs on shared nodes?

Fri May 4 08:05:01 MDT 2018

You might try using Partition QoS's, those can do a bunch of neat features.

-Paul Edmon-

On 05/04/2018 09:59 AM, Liam Forbes wrote:
> We have three "big memory" nodes. We'd like to limit the number of 
> jobs that run per node in two partitions that share these nodes. Jobs 
> in these two partitions are limited to a single node max. We'd only 
> like 8 or fewer jobs from either partition to run per node. So at most 
> only 16 jobs should be allowed to share a given node.
>
> Currently, we have
> SelectType=select/cons_res
>   SelectTypeParameters=CR_CPU
> in our slurm.conf
>
> The nodes are defined as:
> NodeName=n[144-146] NodeAddr=10.50.50.[144-146] CPUs=56 Sockets=2 
> CoresPerSocket=14 ThreadsPerCore=2 RealMemory=1500000 
> <tel:1500000> State=UNKNOWN
>
> The two partitions are defined as:
> PartitionName=analysis Nodes=n[144-146] MaxTime=4-0:0 MaxNodes=1 
> State=UP AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO 
> Default=NO
> PartitionName=bio Nodes=n[144-146] MaxTime=14-0:0 MaxNodes=1 State=UP 
> AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO
>
> We discovered the hard way this means users can run 4 jobs per each of 
> the 56 CPUs/threads on each node. Oops! Not what we intended.
>
> All our other compute nodes are defined as exclusive, and we don't 
> allow multiple jobs to run on them.
>
> Any recommendations how to implement the 8 jobs per partition per node 
> limit we'd like? Should we switch our SelectTypeParameters to 
> CR_Socket or CR_Socket_Memory, for example?
>
> -- 
> Regards,
> -liam
>
> -There are uncountably more irrational fears than rational ones. -P. Dolan
> Liam Forbes loforbes at alaska.edu <mailto:loforbes at alaska.edu>  ph: 
> 907-450-8618 <tel:907-450-8618> fax: 907-450-8601 <tel:907-450-8601>
> UAF Research Computing Systems Senior HPC Engineer        CISSP

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180504/8d922790/attachment.html>