[slurm-users] Array jobs vs Fairshare

Stephan Schott schottve at hhu.de
Fri Oct 23 22:53:47 UTC 2020


Apparently there is not much out there regarding this. To me it seems that
once an array job has started, they run without checking if the Fairshare
factor of the user actually would allow for another step to start. That
would be far from ideal, as it just opens the door for malicious usage of
the partitions, but maybe I am just wrong and I am missing some parameter.
Would be nice if someone with more experience could chip in.
Cheers,

El mié., 21 oct. 2020 a las 16:09, Riebs, Andy (<andy.riebs at hpe.com>)
escribió:

> Thanks for the additional information, Stephan!
>
>
>
> At this point, I’ll have to ask for anyone with more job array experience
> than I have (because I have none!) to speak up.
>
>
>
> Remember that we’re all in this together(*), so any help that anyone can
> offer will be good!
>
>
>
> Andy
>
>
>
> (*) Well, actually, I’m retiring at the end of the week, so I’m not sure
> that I’ll have a lot of Slurm in my life, going forward J
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Stephan Schott
> *Sent:* Wednesday, October 21, 2020 9:40 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Array jobs vs Fairshare
>
>
>
> And I forgot to mention, things are running in a Qlustar cluster based on
> Ubuntu 18.04.4 LTS Bionic. 😬
>
>
>
> El mié., 21 oct. 2020 a las 15:38, Stephan Schott (<schottve at hhu.de>)
> escribió:
>
> Oh, sure, sorry.
>
> We are using slurm 18.08.8, with a backfill scheduler. The jobs are being
> assigned to the same partition, which limits gpus and cpus to 1 via QOS.
> Here some of the main flags:
>
>
>
> SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty
> --preserve-env --mpi=none $SHELL"
> TaskPlugin=task/affinity,task/cgroup
> TaskPluginParam=Sched
> MinJobAge=300
> FastSchedule=1
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
> PreemptType=preempt/qos
> PreemptMode=requeue
> PriorityType=priority/multifactor
> PriorityFlags=FAIR_TREE
> PriorityFavorSmall=YES
> FairShareDampeningFactor=5
> PriorityWeightAge=1000
> PriorityWeightFairshare=5000
> PriorityWeightJobSize=1000
> PriorityWeightPartition=1000
> PriorityWeightQOS=5000
> PriorityWeightTRES=gres/gpu=1000
> AccountingStorageEnforce=limits,qos,nosteps
> AccountingStorageTRES=gres/gpu
> AccountingStorageHost=localhost
> AccountingStorageType=accounting_storage/slurmdbd
> JobCompType=jobcomp/none
> JobAcctGatherFrequency=30
>
> JobAcctGatherType=jobacct_gather/cgroup
>
>
>
> Any ideas?
>
>
>
> Cheers,
>
>
>
> El mié., 21 oct. 2020 a las 15:17, Riebs, Andy (<andy.riebs at hpe.com>)
> escribió:
>
> Also, of course, any of the information that you can provide about how the
> system is configured: scheduler choices, QOS options, and the like, would
> also help in answering your question.
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Riebs, Andy
> *Sent:* Wednesday, October 21, 2020 9:02 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Array jobs vs Fairshare
>
>
>
> Stephan (et al.),
>
>
>
> There are probably 6 versions of Slurm in common use today, across
> multiple versions each of Debian/Ubuntu, SuSE/SLES, and
> RedHat/CentOS/Fedora. You are more likely to get a good answer if you offer
> some hints about what you are running!
>
>
>
> Regards,
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com
> <slurm-users-bounces at lists.schedmd.com>] *On Behalf Of *Stephan Schott
> *Sent:* Wednesday, October 21, 2020 8:37 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* [slurm-users] Array jobs vs Fairshare
>
>
>
> Hi everyone,
>
> I am having doubts regarding array jobs. To me it seems that the
> JobArrayTaskLimit has precedence over the Fairshare, as users with a way
> lower priority seem to get constant allocations for their array jobs,
> compared to users with "normal" jobs. Can someone confirm this?
>
> Cheers,
>
>
> --
>
> Stephan Schott Verdugo
>
> Biochemist
>
>
> Heinrich-Heine-Universitaet Duesseldorf
> Institut fuer Pharm. und Med. Chemie
> Universitaetsstr. 1
> 40225 Duesseldorf
> Germany
>
>
>
> --
>
> Stephan Schott Verdugo
>
> Biochemist
>
>
> Heinrich-Heine-Universitaet Duesseldorf
> Institut fuer Pharm. und Med. Chemie
> Universitaetsstr. 1
> 40225 Duesseldorf
> Germany
>
>
>
> --
>
> Stephan Schott Verdugo
>
> Biochemist
>
>
> Heinrich-Heine-Universitaet Duesseldorf
> Institut fuer Pharm. und Med. Chemie
> Universitaetsstr. 1
> 40225 Duesseldorf
> Germany
>


-- 
Stephan Schott Verdugo
Biochemist

Heinrich-Heine-Universitaet Duesseldorf
Institut fuer Pharm. und Med. Chemie
Universitaetsstr. 1
40225 Duesseldorf
Germany
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201024/a3816dc1/attachment.htm>


More information about the slurm-users mailing list