[slurm-users] Reserving cores without immediately launching tasks on all of them

Renfro, Michael Renfro at tntech.edu
Fri Nov 26 14:14:50 UTC 2021


The end of the MPICH section at [1] shows an example using salloc [2].

Worst case, you should be able to use the output of “scontrol show hostnames” [3] and use that data to make mpiexec command parameters to run one rank per node, similar to what’s shown at the end of the synopsis section of [4].

[1] https://slurm.schedmd.com/mpi_guide.html#mpich2
[2] https://slurm.schedmd.com/salloc.html
[3] https://slurm.schedmd.com/scontrol.html
[4] https://www.mpich.org/static/docs/v3.1/www1/mpiexec.html

--
Mike Renfro, PhD  / HPC Systems Administrator, Information Technology Services
931 372-3601<tel:931%20372-3601>      / Tennessee Tech University

On Nov 25, 2021, at 12:45 PM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov> wrote:



External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.

________________________________
I want to launch an MPICH job with sbatch with one task per node (each a manager), while also reserving a certain number of cores on each node for the managers to fill up with spawned workers (via MPI_Comm_spawn).   I’d like to avoid using –exclusive.

I tried the arguments –ntasks=20 –cpus-per-task=24, but it appears that 20 * 24 tasks will be launched.   Is there a way to reserve cores without immediately launching tasks on them?   Thanks for any help.

sbatch: defined options
sbatch: -------------------- --------------------
sbatch: cpus-per-task       : 24
sbatch: ignore-pbs          : set
sbatch: ntasks              : 20
sbatch: test-only           : set
sbatch: verbose             : 1
sbatch: -------------------- --------------------
sbatch: end of defined options
sbatch: Linear node selection plugin loaded with argument 4
sbatch: select/cons_res loaded with argument 4
sbatch: Cray/Aries node selection plugin loaded
sbatch: select/cons_tres loaded with argument 4
sbatch: Job 34274 to start at 2021-11-25T12:15:05 using 480 processors on nodes n[001-020] in partition normal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211126/17a19245/attachment.htm>


More information about the slurm-users mailing list