<div dir="ltr">Hi,<div><br></div><div>Your job does not request any specific amount of memory, so it gets the default request. I believe the default request is all the RAM in the node.</div><div><br></div><div>Try something like:</div><div>$ scontrol show config | grep -i defmem<br>DefMemPerNode = 64000<br></div><div><br></div><div>Regards,</div><div>Alex</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 23, 2020 at 12:33 PM Jan van der Laan <<a href="mailto:slurm@eoos.dds.nl">slurm@eoos.dds.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I am having issues getting slurm to run multiple jobs in parallel on the <br>
same machine.<br>
<br>
Most of our jobs are either (relatively) low on CPU and high on memory <br>
(data processing) or low on memory and high on CPU (simulations). The <br>
server we have is generally big enough (256GB Mem; 16 cores) to <br>
accommodate multiple jobs running at the same time and we would like use <br>
slurm to schedule these jobs. However, testing on a small (4 CPU) amazon <br>
server, I am unable to get this working. I would have to use <br>
`SelectType=select/cons_res` and `SelectTypeParameters=CR_CPU_Memory` as <br>
far as I know. However, when starting multiple jobs using a single CPU <br>
these are started sequentially and not in parallel.<br>
<br>
My `slurm.conf`<br>
<br>
===<br>
ControlMachine=ip-172-31-37-52<br>
<br>
MpiDefault=none<br>
ProctrackType=proctrack/pgid<br>
ReturnToService=1<br>
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid<br>
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid<br>
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd<br>
SlurmUser=slurm<br>
StateSaveLocation=/var/lib/slurm-llnl/slurmctld<br>
SwitchType=switch/none<br>
TaskPlugin=task/none<br>
<br>
# SCHEDULING<br>
FastSchedule=1<br>
SchedulerType=sched/backfill<br>
SelectType=select/cons_res<br>
SelectTypeParameters=CR_CPU_Memory<br>
<br>
# LOGGING AND ACCOUNTING<br>
AccountingStorageType=accounting_storage/none<br>
ClusterName=cluster<br>
JobAcctGatherType=jobacct_gather/none<br>
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log<br>
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log<br>
<br>
# COMPUTE NODES<br>
NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2 <br>
ThreadsPerCore=2 State=UNKNOWN<br>
PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE <br>
State=UP<br>
====<br>
<br>
`job.sh`<br>
===<br>
#!/bin/bash<br>
sleep 30<br>
env<br>
===<br>
<br>
Output when running jobs:<br>
===<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 2<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 3<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 4<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 5<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 6<br>
ubuntu@ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh<br>
Submitted batch job 7<br>
ubuntu@ip-172-31-37-52:~$ squeue<br>
JOBID PARTITION NAME USER ST TIME NODES <br>
NODELIST(REASON)<br>
3 test job.sh ubuntu PD 0:00 1 <br>
(Resources)<br>
4 test job.sh ubuntu PD 0:00 1 <br>
(Priority)<br>
5 test job.sh ubuntu PD 0:00 1 <br>
(Priority)<br>
6 test job.sh ubuntu PD 0:00 1 <br>
(Priority)<br>
7 test job.sh ubuntu PD 0:00 1 <br>
(Priority)<br>
2 test job.sh ubuntu R 0:03 1 <br>
ip-172-31-37-52<br>
===<br>
<br>
The jobs are run sequentially, while in principle it should be possible <br>
to run 4 jobs in parallel. I am probably missing something simple. How <br>
do I get this to work?<br>
<br>
Best,<br>
Jan<br>
<br>
</blockquote></div>