<div dir="ltr">
<p>I'm trying to set up GANG scheduling with Slurm on my single-node 
server so the people at the lab can run experiments without blocking 
each other (so if say someone has to run some code that takes days to 
finish, other jobs that take less have the chance to run alternated with
 it and so they don't have to wait days until they can run)</p>
<p>I followed the GANG scheduling slurm.conf setup tutorial on the slurm website, and to check if it's working properly I launched a bunch of jobs 
that print the current time and then sleep for a while. But when I check
 squeue, the jobs never alternate, they run sequentially one after the 
other. Why is this happening?</p>
<p>Here's my <code>slurm.conf</code> file:</p><p>
</p><pre style="margin-left:40px"><code># See the slurm.conf man page for more information.
ClusterName=localcluster
SlurmctldHost=localhost
ProctrackType=proctrack/linuxproc
ReturnToService=1                                                                                                                                                         
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm                                                                                                                                                           
StateSaveLocation=/var/lib/slurm/slurmctld                                                                                                                                
TaskPlugin=task/none                                                                                                                                                      
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
#Seteado para que cada laburo alterne cada 15 segundos
SchedulerTimeSlice=15                                                                                                                                                     
SchedulerType=sched/builtin
SelectType=select/linear
SelectTypeParameters=CR_Memory                                                                                                                                            
PreemptMode=GANG
# LOGGING AND ACCOUNTING
JobCompType=jobcomp/none
JobAcctGatherFrequency=30                                                                                                                                                 
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log                                                                                                                                   
# COMPUTE NODES
NodeName=lab04 CPUs=48 CoresPerSocket=12 ThreadsPerCore=2 
State=UNKNOWN RealMemory=257249
PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=FORCE:6 DefMemPerNode=257249 MaxMemPerNode=257249</code></pre>

<p>
</p><p>From what I understand, <code>SchedulerTimeSlice=15</code> means that jobs should alternate running every 15 seconds.</p>
<p>This is the job I'm launching with <code>sbatch</code> (launching many copies of this job one after the other):</p><div>
<pre style="margin-left:40px"><code>#!/bin/bash                                                                                                                                                               #SBATCH -J test # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to %jobId)
#SBATCH -N 1 # Total number of nodes requested
echo "Test output from Slurm Testjob"
date                                                                                                                                                                      
sleep 10                                                                                                                                                                  
date                                                                                                                                                                      
sleep 10
date
sleep 10
date
sleep 10<br></code></pre><pre></pre><p>I would expect jobs to print one or two dates, then the Slurm 
scheduler comes and lets another job run in the meantime, and then the 
final print(s) come with a delay way greater than 10secs</p>
<p>However after launching many copies of this job this is what I see on <code>squeue</code>:</p>
<p>Before the first job launched is done:</p>


<pre><code>             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                18    LocalQ     test eiarussi PD       0:00      1 (Resources)
                19    LocalQ     test eiarussi PD       0:00      1 (Priority)                                                                                                            
                20    LocalQ     test eiarussi PD       0:00      1 (Priority)                                                                                                            
                21    LocalQ     test eiarussi PD       0:00      1 (Priority)                                                                                                            
                17    LocalQ     test eiarussi  R       0:31      1 lab04</code></pre>

</div><div>
After that job ends: </div><div>
<pre><code>            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)                                                                                                      
                19    LocalQ     test eiarussi PD       0:00      1 (Resources)
                20    LocalQ     test eiarussi PD       0:00      1 (Priority)
                21    LocalQ     test eiarussi PD       0:00      1 (Priority)
                18    LocalQ     test eiarussi  R       0:02      1 lab04</code></pre>


</div><div>Why did the first job run for 30+ seconds uninterrupted, instead of 
another job being allowed to run after 15 seconds? Am I misunderstanding
 how GANG scheduling works, or is the a problem in my conf file?



</div>



</div>