[slurm-users] associations, limits,qos
Nizar Abed
nizar at huji.ac.il
Mon Jan 25 14:03:55 UTC 2021
Hi,
Right, I understand this, what I’m describing is:
If a job is submitted to multiple partitions, -p part1, part2...
When the required resources in a partition (where the job can run) become available, I’d expect the job to be dispatched to the partition with the correct association(qos and limits), but it’s not the case.
Jobs start running on partition with QOS(apparently higher priority), although there is no association entry for this combination(partition/QOS)
Similar case:
https://bugs.schedmd.com/show_bug.cgi?id=1032
All the best,
Nizar
> On 25 Jan 2021, at 15:46, Durai Arasan <arasan.durai at gmail.com> wrote:
>
> Hi,
>
> Jobs submitted with sbatch cannot run on multiple partitions. The job will be submitted to the partition where it can start first. (from sbatch reference)
>
> Best,
> Durai
>
> On Sat, Jan 23, 2021 at 6:50 AM Nizar Abed <nizar at huji.ac.il <mailto:nizar at huji.ac.il>> wrote:
> Hi list,
>
> I’m trying to enforce limits based on associations, but behavior is not as expected.
>
> In slurm.conf:
> AccountingStorageEnforce=associations,limit,qos
>
> Two partitions:
> part1.q
> part2.q
>
> One user:
> user1
>
> One QOS:
> qos1
> MaxJobsPU is not set
>
>
> I’d like to have an association for user 1 for each partition, with same qos
>
> User Def Acct Admin Cluster Account Partition Share Priority MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS
> ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- ---------
> user1 account1 None cl1 account1 part1.q 1 3 qos1
> user1 account1 None cl1 account1 part2.q 1 4 qos1
>
>
>
>
>
>
>
> User1 submit 6 jobs to part2.q:
> 4 start running
> 2 in pending(AssocMaxJobsLimit)
>
>
> User1 submit 6 jobs to part1.q:
> 3 start running
> 3 in pending(AssocMaxJobsLimit)
>
> This ok and expected behavior.
>
> But when user1 submits 12 jobs like:
>
> sbatch -p part1.q,part2.q slurm-job.sh
>
> Only 3 jobs running on part1.q: association of part1.q
> Other 9 jobs on AssocMaxJobsLimit
>
> Why 4 jobs doesn’t start on part2.q?
>
> Worst case(listing part2.q before part1.q):
> sbatch -p part2.q,part1.q slurm-job.sh
> 4(!) jobs running on part1.q
>
>
> Is it possible to allow user to submit to multiple partitions, and slurm picks up correct association for each partition?
> What I’m missing here?
>
>
> Thanks,
> Nizar
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210125/ed5f3138/attachment.htm>
More information about the slurm-users
mailing list