<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body>

<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">

The simplest approach might be to run multiple processes within each batch job.<br>

<br>

</div>

<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">

Gareth<span id="ms-outlook-android-cursor"></span></div>

<div><br>

</div>

<div id="ms-outlook-mobile-signature">Get <a href="https://aka.ms/ghei36">Outlook for Android</a></div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Emre Brookes <emre.brookes@mso.umt.edu><br>

<b>Sent:</b> Wednesday, September 15, 2021 6:42:24 AM<br>

<b>To:</b> Karl Lovink <karl@lovink.net>; Slurm User Community List <slurm-users@lists.schedmd.com><br>

<b>Subject:</b> Re: [slurm-users] Running multi jobs on one CPU in parallel</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">

<div class="PlainText">Hi Karl,<br>

<br>

I haven't tested the MAX_TASKS_PER_NODE limits.<br>

According to slurm.conf<br>

<br>

*MaxTasksPerNode*<br>

    Maximum number of tasks Slurm will allow a job step to spawn on a<br>

    single node.<br>

    The default *MaxTasksPerNode* is 512. May not exceed 65533<br>

<br>

So I'd try setting that and "scontrol reconfigure"<br>

before attempting a recompile.<br>

Seems the documentation is inconsistent on this point.<br>

<br>

-Emre<br>

<br>

<br>

<br>

Karl Lovink wrote:<br>

> Hi Emre,<br>

><br>

> MAX_TASKS_PER_NODE is set to 512. Does this means I cannot run more than<br>

> 512 jobs in parallel on one node? Or can I change MAX_TASKS_PER_NODE to<br>

> a higher value?<br>

> And recompile slurm.....<br>

><br>

> Regards,<br>

> Karl<br>

><br>

><br>

> On 14/09/2021 21:47, Emre Brookes wrote:<br>

>> *-O*, *--overcommit*<br>

>>     Overcommit resources. When applied to job allocation, only one CPU<br>

>>     is allocated to the job per node and options used to specify the<br>

>>     number of tasks per node, socket, core, etc. are ignored. When<br>

>>     applied to job step allocations (the *srun* command when executed<br>

>>     within an existing job allocation), this option can be used to<br>

>>     launch more than one task per CPU. Normally, *srun* will not<br>

>>     allocate more than one process per CPU. By specifying *--overcommit*<br>

>>     you are explicitly allowing more than one process per CPU. However<br>

>>     no more than *MAX_TASKS_PER_NODE* tasks are permitted to execute per<br>

>>     node. NOTE: *MAX_TASKS_PER_NODE* is defined in the file /slurm.h/<br>

>>     and is not a variable, it is set at Slurm build time.<br>

>><br>

>> I have used this successfully to run more jobs than cpus/cores avail.<br>

>><br>

>> -e.<br>

>><br>

>><br>

>><br>

>> Karl Lovink wrote:<br>

>>> Hello,<br>

>>><br>

>>> I am in the process of setting up our SLURM environment. We want to use<br>

>>> SLURM during our DDoS exercises for dispatching DDoS attack scripts. We<br>

>>> need a lot of parallel running jobs on a total of 3 nodes.I can't get it<br>

>>> to run more than 128 jobs simultaneously. There are 128 cpu's in the<br>

>>> compute nodes.<br>

>>><br>

>>> How can I ensure that I can run more jobs in parallel than there are<br>

>>> CPUs in the compute node?<br>

>>><br>

>>> Thanks<br>

>>> Karl<br>

>>><br>

>>><br>

>>> My srun script is:<br>

>>> srun --exclusive --nodes 3 --ntasks 384 /ddos/demo/showproc.sh<br>

>>><br>

>>> And my slurm.conf file:<br>

>>> ClusterName=ddos-cluster<br>

>>> ControlMachine=slurm<br>

>>> SlurmUser=ddos<br>

>>> SlurmctldPort=6817<br>

>>> SlurmdPort=6818<br>

>>> AuthType=auth/munge<br>

>>> StateSaveLocation=/opt/slurm/spool/ctld<br>

>>> SlurmdSpoolDir=/opt/slurm/spool/d<br>

>>> SwitchType=switch/none<br>

>>> MpiDefault=none<br>

>>> SlurmctldPidFile=/opt/slurm/run/.pid<br>

>>> SlurmdPidFile=/opt/slurm/run/slurmd.pid<br>

>>> ProctrackType=proctrack/pgid<br>

>>> PluginDir=/opt/slurm/lib/slurm<br>

>>> ReturnToService=2<br>

>>> TaskPlugin=task/none<br>

>>> SlurmctldTimeout=300<br>

>>> SlurmdTimeout=300<br>

>>> InactiveLimit=0<br>

>>> MinJobAge=300<br>

>>> KillWait=30<br>

>>> Waittime=0<br>

>>> SchedulerType=sched/backfill<br>

>>><br>

>>> SelectType=select/cons_tres<br>

>>> SelectTypeParameters=CR_Core<br>

>>><br>

>>> SlurmctldDebug=3<br>

>>> SlurmctldLogFile=/opt/slurm/log/slurmctld.log<br>

>>> SlurmdDebug=3<br>

>>> SlurmdLogFile=/opt/slurm/log/slurmd.log<br>

>>> JobCompType=jobcomp/none<br>

>>> JobAcctGatherType=jobacct_gather/none<br>

>>> AccountingStorageTRES=gres/gpu<br>

>>> DebugFlags=CPU_Bind,gres<br>

>>> AccountingStorageType=accounting_storage/slurmdbd<br>

>>> AccountingStorageHost=localhost<br>

>>> AccountingStoragePass=/var/run/munge/munge.socket.2<br>

>>> AccountingStorageUser=slurm<br>

>>> SlurmctldParameters=enable_configurable<br>

>>> GresTypes=gpu<br>

>>> DefMemPerNode=256000<br>

>>> NodeName=aivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16<br>

>>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN<br>

>>> NodeName=mivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16<br>

>>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN<br>

>>> NodeName=fiod CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16<br>

>>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN<br>

>>> PartitionName=ddos Nodes=ALL Default=YES MaxTime=INFINITE State=UP<br>

>>> PartitionName=adhoc Nodes=ALL Default=YES MaxTime=INFINITE State=UP<br>

>>><br>

>>> .<br>

>>><br>

> .<br>

><br>

<br>

<br>

</div>

</span></font></div>

</body>

</html>