[slurm-users] Simultaneously running multiple jobs on same node

Jan van der Laan slurm at eoos.dds.nl
Tue Nov 24 08:58:49 UTC 2020


Hi Alex,

Thanks a lot. I suspected it was something trivial.

ubuntu at ip-172-31-12-211:~$ scontrol show config | grep -i defmem
DefMemPerNode           = UNLIMITED


Specifying `sbatch --mem=1M job.sh` works. I will probably specify a 
default value in the slurm.conf (just tried; that also helps).

Best,
Jan









On 23-11-2020 22:15, Alex Chekholko wrote:
> Hi,
> 
> Your job does not request any specific amount of memory, so it gets the 
> default request.  I believe the default request is all the RAM in the node.
> 
> Try something like:
> $ scontrol show config | grep -i defmem
> DefMemPerNode           = 64000
> 
> Regards,
> Alex
> 
> 
> On Mon, Nov 23, 2020 at 12:33 PM Jan van der Laan <slurm at eoos.dds.nl 
> <mailto:slurm at eoos.dds.nl>> wrote:
> 
>     Hi,
> 
>     I am having issues getting slurm to run multiple jobs in parallel on
>     the
>     same machine.
> 
>     Most of our jobs are either (relatively) low on CPU and high on memory
>     (data processing) or low on memory and high on CPU (simulations). The
>     server we have is generally big enough (256GB Mem; 16 cores) to
>     accommodate multiple jobs running at the same time and we would like
>     use
>     slurm to schedule these jobs. However, testing on a small (4 CPU)
>     amazon
>     server, I am unable to get this working. I would have to use
>     `SelectType=select/cons_res` and
>     `SelectTypeParameters=CR_CPU_Memory` as
>     far as I know. However, when starting multiple jobs using a single CPU
>     these are started sequentially and not in parallel.
> 
>     My `slurm.conf`
> 
>     ===
>     ControlMachine=ip-172-31-37-52
> 
>     MpiDefault=none
>     ProctrackType=proctrack/pgid
>     ReturnToService=1
>     SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
>     SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
>     SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
>     SlurmUser=slurm
>     StateSaveLocation=/var/lib/slurm-llnl/slurmctld
>     SwitchType=switch/none
>     TaskPlugin=task/none
> 
>     # SCHEDULING
>     FastSchedule=1
>     SchedulerType=sched/backfill
>     SelectType=select/cons_res
>     SelectTypeParameters=CR_CPU_Memory
> 
>     # LOGGING AND ACCOUNTING
>     AccountingStorageType=accounting_storage/none
>     ClusterName=cluster
>     JobAcctGatherType=jobacct_gather/none
>     SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
>     SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
> 
>     # COMPUTE NODES
>     NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2
>     ThreadsPerCore=2 State=UNKNOWN
>     PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE
>     State=UP
>     ====
> 
>     `job.sh`
>     ===
>     #!/bin/bash
>     sleep 30
>     env
>     ===
> 
>     Output when running jobs:
>     ===
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 2
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 3
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 4
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 5
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 6
>     ubuntu at ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
>     Submitted batch job 7
>     ubuntu at ip-172-31-37-52:~$ squeue
>                    JOBID PARTITION     NAME     USER ST       TIME  NODES
>     NODELIST(REASON)
>                        3      test   job.sh   ubuntu PD       0:00      1
>     (Resources)
>                        4      test   job.sh   ubuntu PD       0:00      1
>     (Priority)
>                        5      test   job.sh   ubuntu PD       0:00      1
>     (Priority)
>                        6      test   job.sh   ubuntu PD       0:00      1
>     (Priority)
>                        7      test   job.sh   ubuntu PD       0:00      1
>     (Priority)
>                        2      test   job.sh   ubuntu  R       0:03      1
>     ip-172-31-37-52
>     ===
> 
>     The jobs are run sequentially, while in principle it should be possible
>     to run 4 jobs in parallel. I am probably missing something simple. How
>     do I get this to work?
> 
>     Best,
>     Jan
> 



More information about the slurm-users mailing list