[slurm-users] Reserving cores without immediately launching tasks on all of them

Fri Nov 26 23:18:49 UTC 2021

Nodes are probably misconfigured in slurm.conf, yes. You can use the output of 'slurmd -C' on a compute node to get started on what your NodeName entry in slurm.conf should be:

[root at node001 ~]# slurmd -C
NodeName=node001 CPUs=28 Boards=1 SocketsPerBoard=2 CoresPerSocket=14 ThreadsPerCore=1 RealMemory=64333
UpTime=161-22:35:13

[root at node001 ~]# grep -i 'nodename=node\[001' /etc/slurm/slurm.conf
NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2 ThreadsPerCore=1 Weight=10201

Make sure that RealMemory in slurm.conf is no larger than what 'slurmd -C' reports. If I recall correctly, my slurm.conf settings are otherwise equivalent, but not word-for-word identical, with what 'slurmd -C' reports (I just specified sockets instead of both boards and socketsperboard, for example).

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Date: Friday, November 26, 2021 at 1:22 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Reserving cores without immediately launching tasks on all of them

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.

________________________________
Mike,

I’m working through your suggestions.   I tried

$ salloc –ntasks=20 --cpus-per-task=24 --verbose myscript.bash

but salloc says that the resources are not available:

salloc: defined options
salloc: -------------------- --------------------
salloc: cpus-per-task       : 24
salloc: ntasks              : 20
salloc: verbose             : 1
salloc: -------------------- --------------------
salloc: end of defined options
salloc: Linear node selection plugin loaded with argument 4
salloc: select/cons_res loaded with argument 4
salloc: Cray/Aries node selection plugin loaded
salloc: select/cons_tres loaded with argument 4
salloc: Granted job allocation 34299
srun: error: Unable to create step for job 34299: Requested node configuration is not available

$ scontrol show nodes  /* oddly says that there is one core per socket.  could our nodes be misconfigured? */

NodeName=n020 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUTot=24 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=n020 NodeHostName=n020 Version=20.02.3
   OS=Linux 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Mon Jun 14 17:25:42 EDT 2021
   RealMemory=1 AllocMem=0 FreeMem=126431 Sockets=24 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=normal,low,high
   BootTime=2021-11-18T08:43:44 SlurmdStartTime=2021-11-18T08:44:31
   CfgTRES=cpu=24,mem=1M,billing=24
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Renfro, Michael
Sent: Friday, November 26, 2021 8:15 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [EXTERNAL] Re: [slurm-users] Reserving cores without immediately launching tasks on all of them

The end of the MPICH section at [1] shows an example using salloc [2].

Worst case, you should be able to use the output of “scontrol show hostnames” [3] and use that data to make mpiexec command parameters to run one rank per node, similar to what’s shown at the end of the synopsis section of [4].

[1] https://slurm.schedmd.com/mpi_guide.html#mpich2<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fmpi_guide.html%23mpich2&data=04%7C01%7Crenfro%40tntech.edu%7Cc9123a18a2934ad9e8a008d9b111b224%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637735513496482886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=rBK8XedubO1pmIa8dSHCCAnM713gruugH9pSamSvpX4%3D&reserved=0>
[2] https://slurm.schedmd.com/salloc.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsalloc.html&data=04%7C01%7Crenfro%40tntech.edu%7Cc9123a18a2934ad9e8a008d9b111b224%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637735513496492881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oUelRHVc3ZrU0p9WMj9PiTNow9apx0Bc%2Fp0Kkg4aZic%3D&reserved=0>
[3] https://slurm.schedmd.com/scontrol.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fscontrol.html&data=04%7C01%7Crenfro%40tntech.edu%7Cc9123a18a2934ad9e8a008d9b111b224%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637735513496502874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=q6VP%2FZGZil%2Fj2uweeCis1Wz5z94gNpdnBB9k2ojbf9U%3D&reserved=0>
[4] https://www.mpich.org/static/docs/v3.1/www1/mpiexec.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mpich.org%2Fstatic%2Fdocs%2Fv3.1%2Fwww1%2Fmpiexec.html&data=04%7C01%7Crenfro%40tntech.edu%7Cc9123a18a2934ad9e8a008d9b111b224%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637735513496512872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=xJmBSMC4I5Fodzbpg%2FrchbD8Ml6v3l0ZjoGHL5Hl3KE%3D&reserved=0>
--
Mike Renfro, PhD  / HPC Systems Administrator, Information Technology Services
931 372-3601<tel:931%20372-3601>      / Tennessee Tech University

On Nov 25, 2021, at 12:45 PM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>> wrote:

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.

________________________________
I want to launch an MPICH job with sbatch with one task per node (each a manager), while also reserving a certain number of cores on each node for the managers to fill up with spawned workers (via MPI_Comm_spawn).   I’d like to avoid using –exclusive.

I tried the arguments –ntasks=20 –cpus-per-task=24, but it appears that 20 * 24 tasks will be launched.   Is there a way to reserve cores without immediately launching tasks on them?   Thanks for any help.

sbatch: defined options
sbatch: -------------------- --------------------
sbatch: cpus-per-task       : 24
sbatch: ignore-pbs          : set
sbatch: ntasks              : 20
sbatch: test-only           : set
sbatch: verbose             : 1
sbatch: -------------------- --------------------
sbatch: end of defined options
sbatch: Linear node selection plugin loaded with argument 4
sbatch: select/cons_res loaded with argument 4
sbatch: Cray/Aries node selection plugin loaded
sbatch: select/cons_tres loaded with argument 4
sbatch: Job 34274 to start at 2021-11-25T12:15:05 using 480 processors on nodes n[001-020] in partition normal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211126/2e1de70e/attachment-0001.htm>