[slurm-users] Running multi jobs on one CPU in parallel

Karl Lovink karl at lovink.net
Tue Sep 14 19:34:24 UTC 2021


Hello,

I am in the process of setting up our SLURM environment. We want to use
SLURM during our DDoS exercises for dispatching DDoS attack scripts. We
need a lot of parallel running jobs on a total of 3 nodes.I can't get it
to run more than 128 jobs simultaneously. There are 128 cpu's in the
compute nodes.

How can I ensure that I can run more jobs in parallel than there are
CPUs in the compute node?

Thanks
Karl


My srun script is:
srun --exclusive --nodes 3 --ntasks 384 /ddos/demo/showproc.sh

And my slurm.conf file:
ClusterName=ddos-cluster
ControlMachine=slurm
SlurmUser=ddos
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/opt/slurm/spool/ctld
SlurmdSpoolDir=/opt/slurm/spool/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/opt/slurm/run/.pid
SlurmdPidFile=/opt/slurm/run/slurmd.pid
ProctrackType=proctrack/pgid
PluginDir=/opt/slurm/lib/slurm
ReturnToService=2
TaskPlugin=task/none
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill

SelectType=select/cons_tres
SelectTypeParameters=CR_Core

SlurmctldDebug=3
SlurmctldLogFile=/opt/slurm/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/opt/slurm/log/slurmd.log
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/none
AccountingStorageTRES=gres/gpu
DebugFlags=CPU_Bind,gres
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageUser=slurm
SlurmctldParameters=enable_configurable
GresTypes=gpu
DefMemPerNode=256000
NodeName=aivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
NodeName=mivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
NodeName=fiod CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
PartitionName=ddos Nodes=ALL Default=YES MaxTime=INFINITE State=UP
PartitionName=adhoc Nodes=ALL Default=YES MaxTime=INFINITE State=UP



More information about the slurm-users mailing list