[slurm-users] [EXT] Set a per-cluster default limit of the number of active cores per user at a time

Sean Crosby scrosby at unimelb.edu.au
Fri Jun 19 23:20:02 UTC 2020


Hi Paddy,

Why don't you add new QoS's and add them as partition QoS for each
partition, and then set the defaults on those partition QoS?

Like

sacctmgr add qos cloud

PartitionName=cloud Nodes=node[1-6] Default=YES MaxTime=30-0
DefaultTime=0:10:0 State=DOWN  QoS=cloud

That way you could have different QoS names for all the partitions across
all of your clusters, and set the limits on the QoS?

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Sat, 20 Jun 2020 at 07:24, Paddy Doyle <paddy at tchpc.tcd.ie> wrote:

> UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.
>
> Hi all,
>
> I've been trying to understand how to properly set a limit on the number of
> cores a user (or an association is fine either) can have in use at any one
> time.
>
> Ideally, I'd like to be able to set a default value once for the cluster,
> and then have it inherit down to lots of associations and users. And there
> are multiple clusters that need such a limit.
>
> Our setup has a single shared Slurmdbd, with multiple clusters connected
> back to it (I think that's important for QOS-based solutions).
>
> Most of the previous mails about this on the list (I know it's come up many
> times before) talk about QOS-based solutions, but the problem is that the
> QOS limits are global across all clusters, and so we can't use them like
> that.
>
> I've tried lots of different sacctmgr options on a test cluster, and can't
> seem to get it right. Any help would be really appreciated!
>
>
> I'll go through what I've tried:
>
>
> MaxJobs: this is not right, as it limits the jobs, not the number of cores.
> So a user can have lots of high-core-count jobs.
>
>
>   sacctmgr update qos normal set maxtresperuser=cpu=32
>
> That will work.. except that QOS is global across all of the
> slurmdbd-connected clusters. So unless every cluster is of the same size
> and the policies need to be the same, it won't work in practice.
>
>
>   sacctmgr update account cluster=C1 set MaxTRES=cpu=32 where account=A1
>
> That limit is per-job, not per user.
>
>
>   sacctmgr update account cluster=C1  set GrpTRES=cpu=32
>
> That limits a max of 32 cores in use over the entire cluster, so that's not
> right.
>
>
>   sacctmgr update account cluster=C1  set GrpTRES=cpu=32 where account=A1
>
> That will work alright for *that* account.
>
> But the idea of having to do this for many 10s of accounts doesn't leave me
> too happy. And we would have to make it part of a new account workflow. And
> any future policy changes would have to be reset individually for all
> existing accounts.
>
>
> Is there some other way that I've missed?
>
> Thanks!
>
> Paddy
>
> --
> Paddy Doyle
> Research IT / Trinity Centre for High Performance Computing,
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> Phone: +353-1-896-3725
> https://www.tchpc.tcd.ie/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200620/cd56e3be/attachment.htm>


More information about the slurm-users mailing list