[slurm-users] [EXT] User association with partition and Qos

Amjad Syed amjadcsu at gmail.com
Tue Aug 31 10:03:47 UTC 2021


Just a correction

We use
sacctmgr modify user=<username> set qos+=gpu-rtx6000-2

Amjad

On Tue, Aug 31, 2021 at 10:17 AM Amjad Syed <amjadcsu at gmail.com> wrote:

> Hi Sean
>
> We have been adding by using the following command
>
> sacctmgr modify user set qos+=gpu-rtx-reserved
>
> We have a single account that is associated with all our users and root
> account for admin
>
>
>
> Is that the issue, we need to associate user with account?
>
>
> On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby <scrosby at unimelb.edu.au>
> wrote:
>
>> Hi Amjad,
>>
>> AccountingStorageUser is the user used to connect to the accounting
>> database. If you have it defined in slurm.conf, it is ignored.
>>
>> From the output you showed, it says the user cjr13geu in the cluster
>> uea_cluster has access to the QoS.
>>
>> How are you adding the QoS to other users? The way you would do it would
>> be
>>
>> sacctmgr modify account <accountname> user=<username> set qos+=
>> gpu-rtx-reserved
>>
>> or
>>
>> sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved
>>
>> if you want to give it to every user in <accountname>
>>
>> Sean
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Amjad Syed <amjadcsu at gmail.com>
>> *Sent:* Tuesday, 31 August 2021 17:46
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] [EXT] User association with partition and
>> Qos
>>
>> * External email: Please exercise caution *
>> ------------------------------
>> Hi Sean
>>
>> Here is the output for gpu-rtx-reserved qos
>>
>> sacctmgr show account withassoc -p | grep gpu-rtx-reserved
>>
>>
>>
>> default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,
>> *gpu-rtx-reserved*,hmem,ht,uea_def_qos|
>>
>>
>>
>>
>>
>> sontrol show part gpu-rtx6000-2
>>
>> PartitionName=gpu-rtx6000-2
>>
>>    AllowGroups=ALL AllowAccounts=ALL
>> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea
>>
>>    AllocNodes=ALL Default=NO QoS=N/A
>>
>>    DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO
>> GraceTime=0 Hidden=NO
>>
>>    MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO
>> MaxCPUsPerNode=UNLIMITED
>>
>>    Nodes=g[15-29]
>>
>>    PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
>> OverSubscribe=NO
>>
>>    OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
>>
>>    State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE
>>
>>    JobDefaults=(null)
>>
>>    DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED
>>
>>
>>
>>
>> On a different note we have the following in  slurm.conf
>>
>>
>> AccountingStorageUser=slurm
>>
>>
>> But we have been adding qos and assigning users as root ? Can this be an
>> issue
>>
>>
>>
>>
>> Amjad
>>
>> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <scrosby at unimelb.edu.au>
>> wrote:
>>
>> What does sacctmgr show for the user you added to have access to the QoS,
>> and what does Slurm show for the partition config?
>>
>> sacctmgr show account withassoc -p
>> scontrol show part gpu-rtx6000-2
>>
>> Sean
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Amjad Syed <amjadcsu at gmail.com>
>> *Sent:* Tuesday, 31 August 2021 17:03
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] [EXT] User association with partition and
>> Qos
>>
>> * External email: Please exercise caution *
>> ------------------------------
>> Hello me again
>>
>> Just found out that when our slurmctld restarts all qos are gone.
>>
>> I mean users who have association with the qos can not submit job with
>> sbatch, they get error as
>>
>> sbatch: error: Batch job submission failed: Invalid qos specification
>>
>>
>> Do we need to make anymore changes in slurm.conf so that qos becomes
>> permanent ?
>>
>> Amjad
>>
>> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjadcsu at gmail.com> wrote:
>>
>> Hi Sean,
>>
>> Thanks for the suggestion, seems to work now.
>>
>> Majid
>>
>> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scrosby at unimelb.edu.au>
>> wrote:
>>
>> Hi Amjad,
>>
>> Make sure you have qos in the config entry AccountingStorageEnforce
>>
>> e.g.
>>
>> AccountingStorageEnforce=associations,limits,qos,safe
>>
>> Sean
>>
>> ------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Amjad Syed <amjadcsu at gmail.com>
>> *Sent:* Friday, 27 August 2021 20:28
>> *To:* slurm-users at schedmd.com <slurm-users at schedmd.com>
>> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>>
>> * External email: Please exercise caution *
>> ------------------------------
>> Hello all
>>
>> We are having an issue understanding user association and partition.
>>
>> Currently we have a partition with 30 GPU cards .
>>
>> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>>
>> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>>
>>                                                    MaxTRESPU
>>
>>        -----------------------------------------------------
>>                                            cpu=96,gres/gpu=2
>>
>>
>>
>>
>> We have defined a user test that is assoc with this qos
>>
>>
>> sacctmgr show assoc user=test format=user,qos
>>
>>
>> Qos
>>
>> gpu-rtx
>>
>>
>>
>> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>>
>>
>> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>>
>>                                                    MaxTRESPU
>>
>>        -----------------------------------------------------
>>                                            cpu=192,gres/gpu=8
>>
>> User test is not associated with gpu-rtx-reserved qos. So he should not
>> be able to use more then gpu=2 .
>> Both of these qos are now in slurm.conf for the partition
>>
>> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
>> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>>
>>
>>
>> But we found out that even though user is not assoc with gpu-rtx-reserved
>> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
>> gpu cards
>>
>>
>> So our question is , can the users assoc with one partition qos can use
>> the other qos in the partition  even if they are not associated with it .
>> or in other words , we can only define one partition qos and not more then
>> one.?
>>
>>
>> Hope i was able to explain ?
>>
>>
>> Any advice if we want partition to use more then one qos with different
>> limits and users associated with one qos should not use other qos ?
>>
>>
>> Majid
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210831/96157efb/attachment-0001.htm>


More information about the slurm-users mailing list