[slurm-users] associations, limits,qos

Nizar Abed nizar at huji.ac.il
Mon Jan 25 14:03:55 UTC 2021


Hi,

Right, I understand this, what I’m describing is:

If a job is submitted to multiple partitions, -p part1, part2... 
When the required resources in a partition (where the job can run) become available, I’d expect the job to be dispatched to the partition with the correct association(qos and limits), but it’s not the case.
Jobs start running on partition with QOS(apparently higher priority), although there is no association entry for this combination(partition/QOS)

Similar case:
https://bugs.schedmd.com/show_bug.cgi?id=1032

All the best,
Nizar


> On 25 Jan 2021, at 15:46, Durai Arasan <arasan.durai at gmail.com> wrote:
> 
> Hi,
> 
> Jobs submitted with sbatch cannot run on multiple partitions. The job will be submitted to the partition where it can start first. (from sbatch reference)
> 
> Best,
> Durai
> 
> On Sat, Jan 23, 2021 at 6:50 AM Nizar Abed <nizar at huji.ac.il <mailto:nizar at huji.ac.il>> wrote:
> Hi list,
> 
> I’m trying to enforce limits based on associations, but behavior is not as expected.
> 
> In slurm.conf:
> AccountingStorageEnforce=associations,limit,qos
> 
> Two partitions:
> part1.q
> part2.q
> 
> One user:
> user1
> 
> One QOS:
> qos1
> MaxJobsPU is not set
> 
> 
> I’d like to have an association for user 1 for each partition, with same qos
> 
>       User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS 
> ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- 
>      user1  account1      None       cl1   account1       part1.q         1               3                                                                       qos1           
>      user1  account1      None       cl1   account1       part2.q         1               4                                                                       qos1           
> 
> 
> 
> 
> 
> 
> 
> User1 submit 6 jobs to part2.q:
> 4 start running
> 2 in pending(AssocMaxJobsLimit)
> 
> 
> User1 submit 6 jobs to part1.q:
> 3 start running
> 3 in pending(AssocMaxJobsLimit)
> 
> This ok and expected behavior.
> 
> But when user1 submits 12 jobs like:
> 
> sbatch -p part1.q,part2.q slurm-job.sh 
> 
> Only 3 jobs running on part1.q: association of part1.q
> Other 9 jobs on AssocMaxJobsLimit
> 
> Why 4 jobs doesn’t start on part2.q?
> 
> Worst case(listing part2.q before part1.q):
> sbatch -p part2.q,part1.q slurm-job.sh 
> 4(!) jobs running on part1.q
> 
> 
> Is it possible to allow user to submit to multiple partitions, and slurm picks up correct association for each partition?
> What I’m missing here?
> 
> 
> Thanks,
> Nizar
> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210125/ed5f3138/attachment.htm>


More information about the slurm-users mailing list