Hi George,
George Leaver via slurm-users slurm-users@lists.schedmd.com writes:
Hello,
Previously we were running 22.05.10 and could submit a "multinode" job using only the total number of cores to run, not the number of nodes. For example, in a cluster containing only 40-core nodes (no hyperthreading), Slurm would determine two nodes were needed with only: sbatch -p multinode -n 80 --wrap="...."
Now in 23.02.1 this is no longer the case - we get: sbatch: error: Batch job submission failed: Node count specification invalid
At least -N 2 is must be used (-n 80 can be added) sbatch -p multinode -N 2 -n 80 --wrap="...."
The partition config was, and is, as follows (MinNodes=2 to reject small jobs submitted to this partition - we want at least two nodes requested) PartitionName=multinode State=UP Nodes=node[081-245] DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1 DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE Default=NO
But do you really want to force a job to use two nodes if it could in fact run on one?
What is the use-case for having separate 'uninode' and 'multinode' partitions? We have a university cluster with a very wide range of jobs and essentially a single partition. Allowing all job types to use one partition means that the different resource requirements tend to complement each other to some degree. Doesn't splitting up your jobs over two partitions mean that either one of the two partitions could be full, while the other has idle nodes?
Cheers,
Loris
All nodes are of the form NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000
slurm.conf has EnforcePartLimits = ANY SelectType = select/cons_tres TaskPlugin = task/cgroup,task/affinity
A few fields from: sacctmgr show qos multinode Name|Flags|MaxTRES multinode|DenyOnLimit|node=5
The sbatch/srun man page states: -n, --ntasks .... If -N is not specified, the default behavior is to allocate enough nodes to satisfy the requested resources as expressed by per-job specification options, e.g. -n, -c and --gpus.
I've had a look through release notes back to 22.05.10 but can't see anything obvious (to me).
Has this behaviour changed? Or, more likely, what have I missed ;-) ?
Many thanks, George
-- George Leaver Research Infrastructure, IT Services, University of Manchester http://ri.itservices.manchester.ac.uk%C2%A0%7C%C2%A0@UoM_eResearch