[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

10 Jun 2024


      Hi George,
George Leaver via slurm-users slurm-users@lists.schedmd.com writes:
...
Hello,
Previously we were running 22.05.10 and could submit a "multinode" job
using only the total number of cores to run, not the number of nodes.
For example, in a cluster containing only 40-core nodes (no
hyperthreading), Slurm would determine two nodes were needed with
only:
sbatch -p multinode -n 80 --wrap="...."
Now in 23.02.1 this is no longer the case - we get:
sbatch: error: Batch job submission failed: Node count specification invalid
At least -N 2 is must be used (-n 80 can be added)
sbatch -p multinode -N 2 -n 80 --wrap="...."
The partition config was, and is, as follows (MinNodes=2 to reject
small jobs submitted to this partition - we want at least two nodes
requested)
PartitionName=multinode State=UP Nodes=node[081-245]
DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1
DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE
Default=NO
But do you really want to force a job to use two nodes if it could in
fact run on one?
What is the use-case for having separate 'uninode' and 'multinode'
partitions?  We have a university cluster with a very wide range of jobs
and essentially a single partition.  Allowing all job types to use one
partition means that the different resource requirements tend to
complement each other to some degree.  Doesn't splitting up your jobs
over two partitions mean that either one of the two partitions could be
full, while the other has idle nodes?
Cheers,
Loris
...
All nodes are of the form
NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000
slurm.conf has
EnforcePartLimits       = ANY
SelectType              = select/cons_tres
TaskPlugin              = task/cgroup,task/affinity
A few fields from: sacctmgr show qos multinode
Name|Flags|MaxTRES
multinode|DenyOnLimit|node=5
The sbatch/srun man page states:
-n, --ntasks .... If -N is not specified, the default behavior is to
allocate enough nodes to satisfy the requested resources as expressed
by per-job specification options, e.g. -n, -c and --gpus.
I've had a look through release notes back to 22.05.10 but can't see anything obvious (to me).
Has this behaviour changed? Or, more likely, what have I missed ;-) ?
Many thanks,
George
--
George Leaver
Research Infrastructure, IT Services, University of Manchester
http://ri.itservices.manchester.ac.uk%C2%A0%7C%C2%A0@UoM_eResearch
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

2025

2024

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks