[slurm-users] 答复: What is the 'Root/Cluster association' level in Resource Limits document mean?

taleintervenor at sjtu.edu.cn taleintervenor at sjtu.edu.cn
Thu Feb 10 08:42:11 UTC 2022


Well, ‘sacctmgr modify cluster name=***’ is exactly what we want, and
inspired by this command, we found that ‘sacctmgr show cluster’ can
clearly list all the cluster associations.

 

But during test we found another problem. When limitation is defined both on
cluster level and user level, the smaller one will take effect, user
association did not take precedence of low level one. For example:

> sacctmgr show association format=cluster,account,user,grptres,qos

   Cluster    Account       User       GrpTRES                  QOS

---------- ---------- ---------- ------------- --------------------

    sjtupi       root               gres/gpu=1               normal

    sjtupi   acct-hpc                                        normal

    sjtupi   acct-hpc     hpczty    gres/gpu=2               normal

Cluster association defined 1-gpu limitation and User association defined
2-gpu limitation, and then 2-gpu job be blocked:

> scontrol show job 6567880

JobId=6567880 JobName=test

   UserId=hpczty(3861) GroupId=hpczty(3861) MCS_label=N/A

   Priority=127 Nice=0 Account=acct-hpc QOS=normal

   JobState=PENDING Reason=AssocGrpGRES Dependency=(null)

   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

   …

   NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

   TRES=cpu=1,mem=7G,node=1,billing=1,gres/gpu=2

   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*

   MinCPUsNode=1 MinMemoryCPU=7G MinTmpDiskNode=0

   Features=(null) DelayBoot=00:00:00

   …

According to official document
https://slurm.schedmd.com/resource_limits.html , User association at
hierarchy 3 should have higher priority than Cluster association at
hierarchy 5. Is this a bug or document wrote wrong?

 

发件人: Paul Brunk <pbrunk at uga.edu> 
发送时间: 2022年2月10日 10:28
收件人: Slurm User Community List <slurm-users at lists.schedmd.com>
主题: Re: [slurm-users] What is the 'Root/Cluster association' level in
Resource Limits document mean?

 

Hi:

 

You can use e.g. 'sacctmgr show -s users', and you'll see each user's

cluster assocation as one of the output columns.  If the name were

'yourcluster', then you could do: sacctmgr modify cluster

name=yourcluster set grpTres="node=8".

 

== 

Paul Brunk, system administrator

Georgia Advanced Resource Computing Center

Enterprise IT Svcs, the University of Georgia

 

 

On 2/8/22, 2:33 AM, "slurm-users" <slurm-users-bounces at lists.schedmd.com
<mailto:slurm-users-bounces at lists.schedmd.com> > wrote:

…[H]ow to check or modify this “cluster association”? Using command
sacctmgr show association, I can only list all users’ association.

 

Considering the scene in which we want to set a default node number
limitation for all users, command such as sacctmgr modify user set
grptres="node=8" do can set the limitation on all users at once, but it will
cover the original per-user limitation on some specific account. So it may
not be an satisfying solution. If the “cluster association” exists, it may
be exactly what we want. So how to set the “cluster association”?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220210/a70e98cc/attachment.htm>


More information about the slurm-users mailing list