[slurm-users] Holding back jobs over QOS limit

Lech Nieroda lech.nieroda at uni-koeln.de
Wed Aug 21 12:50:21 UTC 2019


Hello Florian,

unless the proposed order of job execution needs to be adhered to at all times, it might be easier and fairer to use the fairshare mechanism.
As the name suggests, it was created to provide each user (or account) with a fair share of ressources. It regards previous computation time as a basis to influence job priority.
It can be fine tuned by PriorityWeight options, e.g. PriorityWeightAge=0 if you want to ignore the time a job spends in the queue. It can also be modeled to mirror a hierarchic and weighted account structure, e.g. with various groups or departments.
This might be a fairer solution, as it’s not the number of jobs but actual cpu usage over time that is balanced out.

Regards,
Lech


> Am 21.08.2019 um 14:13 schrieb Jochheim, Florian <florian.jochheim at mpibpc.mpg.de>:
> 
> Hi Folks,
>  
> We have a simple small slurm cluster set up to facilitate a fair usage of the computing resources in our group. Simple in the sense that users only run exclusive jobs on single nodes so far. For fairness, we have set 
>  
> MaxSubmitJobsPerUser=2
> MaxJobsPerUser=2
>  
> I would however like to change the policy in the following way:
>  
> 1)      Each user can submit as many jobs as he/she wants
> 2)      Only two nodes can be used by a single user at any given time, also enabling mpi jobs on up to two nodes
> 3)      Each job that a user has in the queue while being over the limit in 2) will be at the very last position in the queue
>  
> 1)/2) are easy enough, I just get rid of MaxSubmitJobsPerUser and set MaxTRESPerUser=”node=2”. 
>  
> I have not come up with a good way to implement 3) though, I would like the following behavior:
>  
> A)      User X submits two jobs (Id 1 and 2) requiring two nodes each, #1 will start, #2 will be held back (QOSMaxNodePerUserLimit)
> B)      Assume all nodes are taken now
> C)      User Y has no running or queued jobs and submits a job (Id  #3)
> D)      Job #1 finishes, freeing up resources
>  
> Current behavior: Job #2 starts, as it was submitted earlier
> What I want: Job #3 should start first as User Y was not over their QOS limit while User X was at the time of submission
>  
> My thinking: In this way, users could submit as many jobs as they want (which they would like for convenience reasons) without getting unfair precedence over others.
> We want to distribute our resources as equally as possible.
>  
> Is something like this possible to achieve, does anyone have an idea how this could be done? I am happy to hear your thoughts
>  
> Cheers,
> Florian




More information about the slurm-users mailing list