Hello,
Previously we were running 22.05.10 and could submit a "multinode" job using only the total number of cores to run, not the number of nodes. For example, in a cluster containing only 40-core nodes (no hyperthreading), Slurm would determine two nodes were needed with only: sbatch -p multinode -n 80 --wrap="...."
Now in 23.02.1 this is no longer the case - we get: sbatch: error: Batch job submission failed: Node count specification invalid
At least -N 2 is must be used (-n 80 can be added) sbatch -p multinode -N 2 -n 80 --wrap="...."
The partition config was, and is, as follows (MinNodes=2 to reject small jobs submitted to this partition - we want at least two nodes requested) PartitionName=multinode State=UP Nodes=node[081-245] DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1 DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE Default=NO
All nodes are of the form NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000
slurm.conf has EnforcePartLimits = ANY SelectType = select/cons_tres TaskPlugin = task/cgroup,task/affinity
A few fields from: sacctmgr show qos multinode Name|Flags|MaxTRES multinode|DenyOnLimit|node=5
The sbatch/srun man page states: -n, --ntasks .... If -N is not specified, the default behavior is to allocate enough nodes to satisfy the requested resources as expressed by per-job specification options, e.g. -n, -c and --gpus.
I've had a look through release notes back to 22.05.10 but can't see anything obvious (to me).
Has this behaviour changed? Or, more likely, what have I missed ;-) ?
Many thanks, George
-- George Leaver Research Infrastructure, IT Services, University of Manchester http://ri.itservices.manchester.ac.uk%C2%A0%7C%C2%A0@UoM_eResearch