[slurm-users] job not running if partition MaxCPUsPerNode < actual max

Diego Zuccato diego.zuccato at unibo.it
Tue Oct 3 08:17:01 UTC 2023

I've been recently hit by
that refused jobs that couldn't run in all given partitions. Solved by 
setting it to
so that the job gets queued if it can run in ANY given partition (very 
useful if you also use JobSubmitPlugin=all_partitions ).


Il 15/08/2023 18:43, Bernstein, Noam CIV USN NRL (6393) Washington DC 
(USA) ha scritto:
> We have a heterogeneous mix of nodes, most 32 core, but one group of 36 
> core, grouped into homogeneous partitions.  We like to be able to 
> specify multiple partitions so that a job can run on any homogeneous 
> group.  It would be nice if we could run on all such nodes using 32 
> cores per node.  To try to do this, I created a partition for the 
> 36-core nodes (call them n2019) which specifies a max cpu # of 64
>     PartitionName=n2019            DefMemPerCPU=2631 Nodes=compute-4-[0-47]
>     PartitionName=n2019_32         DefMemPerCPU=2631
>     Nodes=compute-4-[0-47] MaxCPUsPerNode=64
>     PartitionName=n2021            DefMemPerCPU=2960 Nodes=compute-7-[0-18]
> However, if I try to run a 128 task, 1 task per core job on n2019_32, 
> the sbatch fails with
>      > sbatch  --ntasks=128 --exclusive
>     --partition=n2019_32  --ntasks-per-core=1 job.pbs
>     sbatch: error: Batch job submission failed: Requested node
>     configuration is not available
> (please ignore the ".pbs" - it's a relic, and the job script works with 
> slurm). The identical command but with "n2019" or "n2021" for the 
> partition works (but the former uses 36 cores per node). If I specify 
> multiple partitions it will only actually run when the non-n2019 (same 
> node set as n2019_32) nodes are available.
> The job header includes only walltime, job name and stdout/stderr files, 
> shell, and a job array range.
> I tried to add "-v" to the sbatch to see if that gives more useful info, 
> but I couldn't get any more insight.  Does anyone have any idea why it's 
> rejecting my job?
> thanks,
> Noam

Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

More information about the slurm-users mailing list