[slurm-users] [EXT] User association with partition and Qos

Amjad Syed amjadcsu at gmail.com
Tue Aug 31 09:17:56 UTC 2021


Hi Sean

We have been adding by using the following command

sacctmgr modify user set qos+=gpu-rtx-reserved

We have a single account that is associated with all our users and root
account for admin



Is that the issue, we need to associate user with account?


On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby <scrosby at unimelb.edu.au> wrote:

> Hi Amjad,
>
> AccountingStorageUser is the user used to connect to the accounting
> database. If you have it defined in slurm.conf, it is ignored.
>
> From the output you showed, it says the user cjr13geu in the cluster
> uea_cluster has access to the QoS.
>
> How are you adding the QoS to other users? The way you would do it would be
>
> sacctmgr modify account <accountname> user=<username> set qos+=
> gpu-rtx-reserved
>
> or
>
> sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved
>
> if you want to give it to every user in <accountname>
>
> Sean
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Amjad Syed <amjadcsu at gmail.com>
> *Sent:* Tuesday, 31 August 2021 17:46
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> ------------------------------
> Hi Sean
>
> Here is the output for gpu-rtx-reserved qos
>
> sacctmgr show account withassoc -p | grep gpu-rtx-reserved
>
>
>
> default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,
> *gpu-rtx-reserved*,hmem,ht,uea_def_qos|
>
>
>
>
>
> sontrol show part gpu-rtx6000-2
>
> PartitionName=gpu-rtx6000-2
>
>    AllowGroups=ALL AllowAccounts=ALL
> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea
>
>    AllocNodes=ALL Default=NO QoS=N/A
>
>    DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
>
>    MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>
>    Nodes=g[15-29]
>
>    PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
>
>    OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
>
>    State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE
>
>    JobDefaults=(null)
>
>    DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED
>
>
>
>
> On a different note we have the following in  slurm.conf
>
>
> AccountingStorageUser=slurm
>
>
> But we have been adding qos and assigning users as root ? Can this be an
> issue
>
>
>
>
> Amjad
>
> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <scrosby at unimelb.edu.au>
> wrote:
>
> What does sacctmgr show for the user you added to have access to the QoS,
> and what does Slurm show for the partition config?
>
> sacctmgr show account withassoc -p
> scontrol show part gpu-rtx6000-2
>
> Sean
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Amjad Syed <amjadcsu at gmail.com>
> *Sent:* Tuesday, 31 August 2021 17:03
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> ------------------------------
> Hello me again
>
> Just found out that when our slurmctld restarts all qos are gone.
>
> I mean users who have association with the qos can not submit job with
> sbatch, they get error as
>
> sbatch: error: Batch job submission failed: Invalid qos specification
>
>
> Do we need to make anymore changes in slurm.conf so that qos becomes
> permanent ?
>
> Amjad
>
> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjadcsu at gmail.com> wrote:
>
> Hi Sean,
>
> Thanks for the suggestion, seems to work now.
>
> Majid
>
> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scrosby at unimelb.edu.au>
> wrote:
>
> Hi Amjad,
>
> Make sure you have qos in the config entry AccountingStorageEnforce
>
> e.g.
>
> AccountingStorageEnforce=associations,limits,qos,safe
>
> Sean
>
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Amjad Syed <amjadcsu at gmail.com>
> *Sent:* Friday, 27 August 2021 20:28
> *To:* slurm-users at schedmd.com <slurm-users at schedmd.com>
> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>
> * External email: Please exercise caution *
> ------------------------------
> Hello all
>
> We are having an issue understanding user association and partition.
>
> Currently we have a partition with 30 GPU cards .
>
> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>
> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>
>                                                    MaxTRESPU
>
>        -----------------------------------------------------
>                                            cpu=96,gres/gpu=2
>
>
>
>
> We have defined a user test that is assoc with this qos
>
>
> sacctmgr show assoc user=test format=user,qos
>
>
> Qos
>
> gpu-rtx
>
>
>
> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>
>
> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>
>                                                    MaxTRESPU
>
>        -----------------------------------------------------
>                                            cpu=192,gres/gpu=8
>
> User test is not associated with gpu-rtx-reserved qos. So he should not be
> able to use more then gpu=2 .
> Both of these qos are now in slurm.conf for the partition
>
> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>
>
>
> But we found out that even though user is not assoc with gpu-rtx-reserved
> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
> gpu cards
>
>
> So our question is , can the users assoc with one partition qos can use
> the other qos in the partition  even if they are not associated with it .
> or in other words , we can only define one partition qos and not more then
> one.?
>
>
> Hope i was able to explain ?
>
>
> Any advice if we want partition to use more then one qos with different
> limits and users associated with one qos should not use other qos ?
>
>
> Majid
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210831/80bb1248/attachment-0001.htm>


More information about the slurm-users mailing list