<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1676956461;
mso-list-type:hybrid;
mso-list-template-ids:-1783171012 -384004384 201916419 201916421 201916417 201916419 201916421 201916417 201916419 201916421;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:-;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:20.25pt;
text-indent:-18.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-font-family:Calibri;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:56.25pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:92.25pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:128.25pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:164.25pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:200.25pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:236.25pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:272.25pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:308.25pt;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-AU" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Hi John,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">I’ll volunteer an opinion. There are circumstances where slurm could contribute to slower overall times for tasks, as slurm can be configured to do pre-job setup and post-job followup (prologue/epilogue).
However, you are reporting within-task timing, not overall timing so this is beside the point.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">In general if running ‘the same code’ gives different results (or different performance) then the difference will be in the environment/context. Slurm is a part of this context but probably not directly
related… More likely perhaps are:<o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">Hardware variations<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">Contention for cpu/memory/network<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">cgroup constraints (And other OS differences)<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">IO connectivity differences (and caching)<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">Maybe effects from past activity – is extra work needed to free memory or cache content – special case of contention.<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">Code differences (possibly from dynamically loaded libraries or a different interpreter/python version).<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:-15.75pt;mso-list:l0 level1 lfo1">
<span style="mso-fareast-language:EN-US">I guess shell environment settings could matter (probably not in your case and probably not one you are within python).<o:p></o:p></span></li></ul>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">In your case, perhaps system calls are slower in one context vs another. t.time() might be slower.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">You might use strace -tttT … to see if there are slow system calls in one context vs the other.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Good luck!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Gareth<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> slurm-users <slurm-users-bounces@lists.schedmd.com>
<b>On Behalf Of </b>Alpha Experiment<br>
<b>Sent:</b> Tuesday, 15 December 2020 6:20 PM<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> [slurm-users] Scripts run slower in slurm?<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi,<br>
<br>
I made a short script in python to test if slurm was correctly limiting the number of CPUs available to each partition. The script is as follows:<br>
<span style="font-family:"Courier New"">import multiprocessing as mp<br>
import time as t<br>
<br>
def fibonacci(n):<br>
n = int(n)<br>
def fibon(a,b,n,result):<br>
c = a+b<br>
result.append(c)<br>
if c < n:<br>
fibon(b,c,n,result)<br>
return result<br>
return fibon(0,1,n,[])<br>
<br>
def calcnfib(n):<br>
res = fibonacci(n)<br>
return res[-1]</span><o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New""><br>
def benchmark(pool):<br>
t0 = t.time()<br>
out = pool.map(calcnfib, range(1000000, 1000000000,1000))<br>
tf = t.time()<br>
return str(tf-t0)<br>
<br>
pool = mp.Pool(4)<br>
print("4: " + benchmark(pool))<br>
<br>
pool = mp.Pool(32)<br>
print("32: " + benchmark(pool))<br>
<br>
pool = mp.Pool(64)<br>
print("64: " + benchmark(pool))<br>
<br>
pool = mp.Pool(128)<br>
print("128: " + benchmark(pool))</span><br>
<br>
It is called using the following submission script:<br>
<span style="font-family:"Courier New"">#!/bin/bash<br>
#SBATCH --partition=full<br>
#SBATCH --job-name="Large"<br>
source testenv1/bin/activate<br>
python3 multithread_example.py</span><br>
<br>
The slurm out file reads<br>
<span style="font-family:"Courier New"">4: 5.660163640975952<br>
32: 5.762076139450073<br>
64: 5.8220226764678955<br>
128: 5.85421347618103</span><br>
<br>
However, if I run<br>
<span style="font-family:"Courier New"">source testenv1/bin/activate<br>
python3 multithread_example.py</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><br>
I find faster and more expected behavior<br>
<span style="font-family:"Courier New"">4: 1.5878620147705078<br>
32: 0.34162330627441406<br>
64: 0.24987316131591797<br>
128: 0.2247719764709472</span><br>
<br>
For reference my slurm configuration file is<br>
<span style="font-family:"Courier New""># slurm.conf file generated by configurator easy.html.<br>
# Put this file on all nodes of your cluster.<br>
# See the slurm.conf man page for more information.<br>
#<br>
#SlurmctldHost=localhost<br>
ControlMachine=localhost<br>
<br>
#MailProg=/bin/mail<br>
MpiDefault=none<br>
#MpiParams=ports=#-#<br>
ProctrackType=proctrack/cgroup<br>
ReturnToService=1<br>
SlurmctldPidFile=/home/slurm/run/slurmctld.pid<br>
#SlurmctldPort=6817<br>
SlurmdPidFile=/home/slurm/run/slurmd.pid<br>
#SlurmdPort=6818<br>
SlurmdSpoolDir=/var/spool/slurm/slurmd/<br>
SlurmUser=slurm<br>
#SlurmdUser=root<br>
StateSaveLocation=/home/slurm/spool/<br>
SwitchType=switch/none<br>
TaskPlugin=task/affinity<br>
<br>
# TIMERS<br>
#KillWait=30<br>
#MinJobAge=300<br>
#SlurmctldTimeout=120<br>
#SlurmdTimeout=300<br>
<br>
# SCHEDULING<br>
SchedulerType=sched/backfill<br>
SelectType=select/cons_tres<br>
SelectTypeParameters=CR_Core<br>
<br>
# LOGGING AND ACCOUNTING<br>
AccountingStorageType=accounting_storage/none<br>
ClusterName=cluster<br>
#JobAcctGatherFrequency=30<br>
JobAcctGatherType=jobacct_gather/none<br>
#SlurmctldDebug=info<br>
SlurmctldLogFile=/home/slurm/log/slurmctld.log<br>
#SlurmdDebug=info<br>
#SlurmdLogFile=</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New""><br>
# COMPUTE NODES<br>
NodeName=localhost CPUs=128 RealMemory=257682 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN<br>
PartitionName=full Nodes=localhost Default=YES MaxTime=INFINITE State=UP<br>
PartitionName=half Nodes=localhost Default=NO MaxTime=INFINITE State=UP MaxNodes=1 MaxCPUsPerNode=64 MaxMemPerNode=128841</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif">Here is my cgroup.conf file as well</span><br>
<span style="font-family:"Courier New"">CgroupAutomount=yes</span><br>
<span style="font-family:"Courier New"">ConstrainCores=no</span><br>
<span style="font-family:"Courier New"">ConstrainRAMSpace=no</span><o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">If anyone has any suggestions for what might be going wrong and why the script takes much longer when run with slurm, please let me know!<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Best,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">John<o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>