I'm sorry, but I still don't get it.
Isn't --nodes=2,4 telling slurm to allocate 2 OR 4 nodes and nothing else?
So, if:
--nodes=2 allocates only two nodes
--nodes=4 allocates only four nodes
--nodes=1-2 allocates min one and max two nodes
--nodes=1-4 allocates min one and max four nodes
what is the allocation rule for --nodes=2,4 which is the so-called size_string allocation?
man sbatch says:
Node count can also be specified as size_string. The size_string specification identifies what nodes
values should be used. Multiple values may be specified using a comma separated list or with a step
function by suffix containing a colon and number values with a "-" separator.
For example, "--nodes=1-15:4" is equivalent to "--nodes=1,5,9,13".
...
The job will be allocated as many nodes as possible within the range specified and without delaying the
initiation of the job.
________________________________ From: Brian Andrus via slurm-users slurm-users@lists.schedmd.com Sent: Thursday, August 29, 2024 7:27:44 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: playing with --nodes=<size_string>
It looks to me that you requested 3 tasks spread across 2 to 4 nodes. Realize --nodes is not targeting your nodes named 2 and 4, it is a count of how many nodes to use. You only needed 3 tasks/cpus, so that is what you were allocated and you have 1 cpu per node, so you get 3 (of up to 4) nodes. Slurm does not give you 4 nodes because you only want 3 tasks.
You see the result in your variables:
SLURM_NNODES=3 SLURM_JOB_CPUS_PER_NODE=1(x3)
If you only want 2 nodes, make --nodes=2.
Brian Andrus
On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:
Hi,
On sbatch's manpage there is this example for <size_string>:
--nodes=1,5,9,13
so either one specifies <minnodes>[-maxnodes] OR <size_string>.
I checked the logs, and there are no reported errors about wrong or ignored options.
MG
________________________________ From: Brian Andrus via slurm-users slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Sent: Thursday, August 29, 2024 4:11:25 PM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Re: playing with --nodes=<size_string>
Your --nodes line is incorrect:
-N, --nodes=<minnodes>[-maxnodes]|<size_string> Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes.
Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving you 3 nodes. Check your logs and check your conf see what your defaults are.
Brian Andrus
On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:
Hello,
I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four Amd nodes (node[05-08], Feature=amd).
# job file
#SBATCH --ntasks=3 #SBATCH --nodes=2,4 #SBATCH --constraint="[intel|amd]"
env | grep SLURM
# slurm.conf
PartitionName=DEFAULT MinNodes=1 MaxNodes=UNLIMITED
# log
SLURM_JOB_USER=software SLURM_TASKS_PER_NODE=1(x3) SLURM_JOB_UID=1002 SLURM_TASK_PID=49987 SLURM_LOCALID=0 SLURM_SUBMIT_DIR=/home/software SLURMD_NODENAME=node01 SLURM_JOB_START_TIME=1724932865 SLURM_CLUSTER_NAME=cluster SLURM_JOB_END_TIME=1724933465 SLURM_CPUS_ON_NODE=1 SLURM_JOB_CPUS_PER_NODE=1(x3) SLURM_GTIDS=0 SLURM_JOB_PARTITION=nodes SLURM_JOB_NUM_NODES=3 SLURM_JOBID=26 SLURM_JOB_QOS=lprio SLURM_PROCID=0 SLURM_NTASKS=3 SLURM_TOPOLOGY_ADDR=node01 SLURM_TOPOLOGY_ADDR_PATTERN=node SLURM_MEM_PER_CPU=0 SLURM_NODELIST=node[01-03] SLURM_JOB_ACCOUNT=dalco SLURM_PRIO_PROCESS=0 SLURM_NPROCS=3 SLURM_NNODES=3 SLURM_SUBMIT_HOST=master SLURM_JOB_ID=26 SLURM_NODEID=0 SLURM_CONF=/etc/slurm/slurm.conf SLURM_JOB_NAME=mpijob SLURM_JOB_GID=1002
SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be two nodes?
Thank you.