[slurm-users] [EXT] User association with partition and Qos

Amjad Syed amjadcsu at gmail.com
Tue Aug 31 07:46:40 UTC 2021


Hi Sean

Here is the output for gpu-rtx-reserved qos

sacctmgr show account withassoc -p | grep gpu-rtx-reserved


default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,
*gpu-rtx-reserved*,hmem,ht,uea_def_qos|





sontrol show part gpu-rtx6000-2

PartitionName=gpu-rtx6000-2

   AllowGroups=ALL AllowAccounts=ALL
AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea

   AllocNodes=ALL Default=NO QoS=N/A

   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO

   MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED

   Nodes=g[15-29]

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=NO

   OverTimeLimit=NONE PreemptMode=GANG,SUSPEND

   State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE

   JobDefaults=(null)

   DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED




On a different note we have the following in  slurm.conf


AccountingStorageUser=slurm


But we have been adding qos and assigning users as root ? Can this be an
issue




Amjad

On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <scrosby at unimelb.edu.au> wrote:

> What does sacctmgr show for the user you added to have access to the QoS,
> and what does Slurm show for the partition config?
>
> sacctmgr show account withassoc -p
> scontrol show part gpu-rtx6000-2
>
> Sean
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Amjad Syed <amjadcsu at gmail.com>
> *Sent:* Tuesday, 31 August 2021 17:03
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos
>
> * External email: Please exercise caution *
> ------------------------------
> Hello me again
>
> Just found out that when our slurmctld restarts all qos are gone.
>
> I mean users who have association with the qos can not submit job with
> sbatch, they get error as
>
> sbatch: error: Batch job submission failed: Invalid qos specification
>
>
> Do we need to make anymore changes in slurm.conf so that qos becomes
> permanent ?
>
> Amjad
>
> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjadcsu at gmail.com> wrote:
>
> Hi Sean,
>
> Thanks for the suggestion, seems to work now.
>
> Majid
>
> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scrosby at unimelb.edu.au>
> wrote:
>
> Hi Amjad,
>
> Make sure you have qos in the config entry AccountingStorageEnforce
>
> e.g.
>
> AccountingStorageEnforce=associations,limits,qos,safe
>
> Sean
>
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Amjad Syed <amjadcsu at gmail.com>
> *Sent:* Friday, 27 August 2021 20:28
> *To:* slurm-users at schedmd.com <slurm-users at schedmd.com>
> *Subject:* [EXT] [slurm-users] User association with partition and Qos
>
> * External email: Please exercise caution *
> ------------------------------
> Hello all
>
> We are having an issue understanding user association and partition.
>
> Currently we have a partition with 30 GPU cards .
>
> We have defined a qos gpu-rtx that allows user to reserve 2 cards
>
> sacctmgr show qos gpu-rtx format=MaxTRESPU%60
>
>                                                    MaxTRESPU
>
>        -----------------------------------------------------
>                                            cpu=96,gres/gpu=2
>
>
>
>
> We have defined a user test that is assoc with this qos
>
>
> sacctmgr show assoc user=test format=user,qos
>
>
> Qos
>
> gpu-rtx
>
>
>
> Now we define another qos  gpu-rtx-reserved  that allows gpu=8
>
>
> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
>
>                                                    MaxTRESPU
>
>        -----------------------------------------------------
>                                            cpu=192,gres/gpu=8
>
> User test is not associated with gpu-rtx-reserved qos. So he should not be
> able to use more then gpu=2 .
> Both of these qos are now in slurm.conf for the partition
>
> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9
> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
>
>
>
> But we found out that even though user is not assoc with gpu-rtx-reserved
> if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8
> gpu cards
>
>
> So our question is , can the users assoc with one partition qos can use
> the other qos in the partition  even if they are not associated with it .
> or in other words , we can only define one partition qos and not more then
> one.?
>
>
> Hope i was able to explain ?
>
>
> Any advice if we want partition to use more then one qos with different
> limits and users associated with one qos should not use other qos ?
>
>
> Majid
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210831/f5576823/attachment-0001.htm>


More information about the slurm-users mailing list