[slurm-users] [EXT] User association with partition and Qos
Sean Crosby
scrosby at unimelb.edu.au
Tue Aug 31 07:20:06 UTC 2021
What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config?
sacctmgr show account withassoc -p
scontrol show part gpu-rtx6000-2
Sean
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Amjad Syed <amjadcsu at gmail.com>
Sent: Tuesday, 31 August 2021 17:03
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] User association with partition and Qos
External email: Please exercise caution
________________________________
Hello me again
Just found out that when our slurmctld restarts all qos are gone.
I mean users who have association with the qos can not submit job with sbatch, they get error as
sbatch: error: Batch job submission failed: Invalid qos specification
Do we need to make anymore changes in slurm.conf so that qos becomes permanent ?
Amjad
On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjadcsu at gmail.com<mailto:amjadcsu at gmail.com>> wrote:
Hi Sean,
Thanks for the suggestion, seems to work now.
Majid
On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scrosby at unimelb.edu.au<mailto:scrosby at unimelb.edu.au>> wrote:
Hi Amjad,
Make sure you have qos in the config entry AccountingStorageEnforce
e.g.
AccountingStorageEnforce=associations,limits,qos,safe
Sean
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Amjad Syed <amjadcsu at gmail.com<mailto:amjadcsu at gmail.com>>
Sent: Friday, 27 August 2021 20:28
To: slurm-users at schedmd.com<mailto:slurm-users at schedmd.com> <slurm-users at schedmd.com<mailto:slurm-users at schedmd.com>>
Subject: [EXT] [slurm-users] User association with partition and Qos
External email: Please exercise caution
________________________________
Hello all
We are having an issue understanding user association and partition.
Currently we have a partition with 30 GPU cards .
We have defined a qos gpu-rtx that allows user to reserve 2 cards
sacctmgr show qos gpu-rtx format=MaxTRESPU%60
MaxTRESPU
-----------------------------------------------------
cpu=96,gres/gpu=2
We have defined a user test that is assoc with this qos
sacctmgr show assoc user=test format=user,qos
Qos
gpu-rtx
Now we define another qos gpu-rtx-reserved that allows gpu=8
sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60
MaxTRESPU
-----------------------------------------------------
cpu=192,gres/gpu=8
User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition
parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved
But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 gpu cards
So our question is , can the users assoc with one partition qos can use the other qos in the partition even if they are not associated with it . or in other words , we can only define one partition qos and not more then one.?
Hope i was able to explain ?
Any advice if we want partition to use more then one qos with different limits and users associated with one qos should not use other qos ?
Majid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210831/2bed775f/attachment-0001.htm>
More information about the slurm-users
mailing list