[slurm-users] Set a per-cluster default limit of the number of active cores per user at a time

Paddy Doyle paddy at tchpc.tcd.ie
Fri Jun 19 21:24:51 UTC 2020


Hi all,

I've been trying to understand how to properly set a limit on the number of
cores a user (or an association is fine either) can have in use at any one
time.

Ideally, I'd like to be able to set a default value once for the cluster,
and then have it inherit down to lots of associations and users. And there
are multiple clusters that need such a limit.

Our setup has a single shared Slurmdbd, with multiple clusters connected
back to it (I think that's important for QOS-based solutions).

Most of the previous mails about this on the list (I know it's come up many
times before) talk about QOS-based solutions, but the problem is that the
QOS limits are global across all clusters, and so we can't use them like
that.

I've tried lots of different sacctmgr options on a test cluster, and can't
seem to get it right. Any help would be really appreciated!


I'll go through what I've tried:


MaxJobs: this is not right, as it limits the jobs, not the number of cores.
So a user can have lots of high-core-count jobs.


  sacctmgr update qos normal set maxtresperuser=cpu=32

That will work.. except that QOS is global across all of the
slurmdbd-connected clusters. So unless every cluster is of the same size
and the policies need to be the same, it won't work in practice.


  sacctmgr update account cluster=C1 set MaxTRES=cpu=32 where account=A1

That limit is per-job, not per user.


  sacctmgr update account cluster=C1  set GrpTRES=cpu=32

That limits a max of 32 cores in use over the entire cluster, so that's not
right.


  sacctmgr update account cluster=C1  set GrpTRES=cpu=32 where account=A1

That will work alright for *that* account.

But the idea of having to do this for many 10s of accounts doesn't leave me
too happy. And we would have to make it part of a new account workflow. And
any future policy changes would have to be reset individually for all
existing accounts.


Is there some other way that I've missed?

Thanks!

Paddy

-- 
Paddy Doyle
Research IT / Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
https://www.tchpc.tcd.ie/



More information about the slurm-users mailing list