sbatch: Node count specification invalid - when only specifying --ntasks - slurm-users

9 Jun 2024


      Hello,
Previously we were running 22.05.10 and could submit a "multinode" job using only the total number of cores to run, not the number of nodes.
For example, in a cluster containing only 40-core nodes (no hyperthreading), Slurm would determine two nodes were needed with only:
sbatch -p multinode -n 80 --wrap="...."
Now in 23.02.1 this is no longer the case - we get:
sbatch: error: Batch job submission failed: Node count specification invalid
At least -N 2 is must be used (-n 80 can be added)
sbatch -p multinode -N 2 -n 80 --wrap="...."
The partition config was, and is, as follows (MinNodes=2 to reject small jobs submitted to this partition - we want at least two nodes requested)
PartitionName=multinode State=UP Nodes=node[081-245] DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1 DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE Default=NO
All nodes are of the form
NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000
slurm.conf has
EnforcePartLimits       = ANY
SelectType              = select/cons_tres
TaskPlugin              = task/cgroup,task/affinity
A few fields from: sacctmgr show qos multinode
Name|Flags|MaxTRES
multinode|DenyOnLimit|node=5
The sbatch/srun man page states:
-n, --ntasks   .... If -N is not specified, the default  behavior is to allocate enough nodes to satisfy the requested resources as expressed by per-job specification options, e.g. -n, -c and --gpus.
I've had a look through release notes back to 22.05.10 but can't see anything obvious (to me).
Has this behaviour changed? Or, more likely, what have I missed ;-) ?
Many thanks,
George
--
George Leaver
Research Infrastructure, IT Services, University of Manchester
http://ri.itservices.manchester.ac.uk%C2%A0%7C%C2%A0@UoM_eResearch