<div dir="ltr">Apparently there is not much out there regarding this. To me it seems that once an array job has started, they run without checking if the Fairshare factor of the user actually would allow for another step to start. That would be far from ideal, as it just opens the door for malicious usage of the partitions, but maybe I am just wrong and I am missing some parameter. Would be nice if someone with more experience could chip in.<div>Cheers,</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">El mié., 21 oct. 2020 a las 16:09, Riebs, Andy (<<a href="mailto:andy.riebs@hpe.com">andy.riebs@hpe.com</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-GB">
<div class="gmail-m_-1323284019538036075WordSection1">
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Thanks for the additional information, Stephan!<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">At this point, I’ll have to ask for anyone with more job array experience than I have (because I have none!) to speak up.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Remember that we’re all in this together(*), so any help that anyone can offer will be good!<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Andy<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">(*) Well, actually, I’m retiring at the end of the week, so I’m not sure that I’ll have a lot of Slurm in my life, going forward
</span><span style="font-size:11pt;font-family:Wingdings;color:rgb(31,73,125)">J</span><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> slurm-users [mailto:<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>]
<b>On Behalf Of </b>Stephan Schott<br>
<b>Sent:</b> Wednesday, October 21, 2020 9:40 AM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] Array jobs vs Fairshare<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">And I forgot to mention, things are running in a Qlustar cluster based on Ubuntu 18.04.4 LTS Bionic.
<span style="font-family:"Segoe UI Symbol",sans-serif">😬</span><u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">El mié., 21 oct. 2020 a las 15:38, Stephan Schott (<<a href="mailto:schottve@hhu.de" target="_blank">schottve@hhu.de</a>>) escribió:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal">Oh, sure, sorry.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">We are using slurm 18.08.8, with a backfill scheduler. The jobs are being assigned to the same partition, which limits gpus and cpus to 1 via QOS. Here some of the main flags:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL"<br>
TaskPlugin=task/affinity,task/cgroup<br>
TaskPluginParam=Sched<br>
MinJobAge=300<br>
FastSchedule=1<br>
SchedulerType=sched/backfill<br>
SelectType=select/cons_res<br>
SelectTypeParameters=CR_CPU_Memory<br>
PreemptType=preempt/qos<br>
PreemptMode=requeue<br>
PriorityType=priority/multifactor<br>
PriorityFlags=FAIR_TREE<br>
PriorityFavorSmall=YES<br>
FairShareDampeningFactor=5<br>
PriorityWeightAge=1000<br>
PriorityWeightFairshare=5000<br>
PriorityWeightJobSize=1000<br>
PriorityWeightPartition=1000<br>
PriorityWeightQOS=5000<br>
PriorityWeightTRES=gres/gpu=1000<br>
AccountingStorageEnforce=limits,qos,nosteps<br>
AccountingStorageTRES=gres/gpu<br>
AccountingStorageHost=localhost<br>
AccountingStorageType=accounting_storage/slurmdbd<br>
JobCompType=jobcomp/none<br>
JobAcctGatherFrequency=30</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">JobAcctGatherType=jobacct_gather/cgroup</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Any ideas?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Cheers,<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">El mié., 21 oct. 2020 a las 15:17, Riebs, Andy (<<a href="mailto:andy.riebs@hpe.com" target="_blank">andy.riebs@hpe.com</a>>) escribió:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Also, of course, any of the information that you can provide about how the system is configured: scheduler
choices, QOS options, and the like, would also help in answering your question.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p>
<div>
<div style="border-right:none currentcolor;border-bottom:none currentcolor;border-left:none currentcolor;border-top:1pt solid currentcolor;padding:3pt 0in 0in">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> slurm-users
[mailto:<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>]
<b>On Behalf Of </b>Riebs, Andy<br>
<b>Sent:</b> Wednesday, October 21, 2020 9:02 AM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] Array jobs vs Fairshare</span><u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Stephan (et al.),</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">There are probably 6 versions of Slurm in common use today, across multiple versions each of Debian/Ubuntu,
SuSE/SLES, and RedHat/CentOS/Fedora. You are more likely to get a good answer if you offer some hints about what you are running!</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Regards,</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Andy</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> slurm-users
[<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">mailto:slurm-users-bounces@lists.schedmd.com</a>]
<b>On Behalf Of </b>Stephan Schott<br>
<b>Sent:</b> Wednesday, October 21, 2020 8:37 AM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> [slurm-users] Array jobs vs Fairshare</span><u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">Hi everyone,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">I am having doubts regarding array jobs. To me it seems that the JobArrayTaskLimit has precedence over the Fairshare, as users with a way lower priority seem to get constant allocations
for their array jobs, compared to users with "normal" jobs. Can someone confirm this?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Cheers,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
-- <u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Stephan Schott Verdugo</span><u></u><u></u></p>
</div>
<p class="MsoNormal"><span style="font-size:9.5pt">Biochemist</span><u></u><u></u></p>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt"><br>
Heinrich-Heine-Universitaet Duesseldorf<br>
Institut fuer Pharm. und Med. Chemie<br>
Universitaetsstr. 1<br>
40225 Duesseldorf<br>
Germany</span><u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><br clear="all">
<br>
-- <u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Stephan Schott Verdugo<u></u><u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.5pt">Biochemist</span><u></u><u></u></p>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt"><br>
Heinrich-Heine-Universitaet Duesseldorf<br>
Institut fuer Pharm. und Med. Chemie<br>
Universitaetsstr. 1<br>
40225 Duesseldorf<br>
Germany<u></u><u></u></span></p>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><br clear="all">
<br>
-- <u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Stephan Schott Verdugo<u></u><u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.5pt">Biochemist</span><u></u><u></u></p>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt"><br>
Heinrich-Heine-Universitaet Duesseldorf<br>
Institut fuer Pharm. und Med. Chemie<br>
Universitaetsstr. 1<br>
40225 Duesseldorf<br>
Germany<u></u><u></u></span></p>
</div>
</div>
</div>
</div>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div style="font-size:12.8px">Stephan Schott Verdugo<br></div><span style="font-size:12.8px">Biochemist</span><br style="font-size:12.8px"><div style="font-size:12.8px"><br>Heinrich-Heine-Universitaet Duesseldorf<br>Institut fuer Pharm. und Med. Chemie<br>Universitaetsstr. 1<br>40225 Duesseldorf<br>Germany</div></div></div>