[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

5 Sep 2024


      Hello again,
Angel de Vicente via slurm-users
slurm-users@lists.schedmd.com writes:
...
[...] I don't understand is why the first three submissions
below do get stopped by sbatch while the last one happily goes through?
...
...
,----
| $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
|
| $ sbatch -N 1 -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
|
| $ sbatch -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
`----
...
...
,----
| $ sbatch -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| Submitted batch job 133982
`----
Ah, I think I do perhaps understand now...
In the first three cases Slurm knows that everything is going to run
inside a single node (either because I explicitly set "-N 1" or because
I'm submitting a single task that uses 76 CPUs ("-n 1 -c 76"), and thus
it knows that the required memory (304000M) exceeds the MaxMemPerNode
configuration and it blocks the submission.
In the last case my job will have 76 single-cpu tasks, but I'm not
explicitly asking for a number of nodes, so in theory the job could be
split in a number of nodes and thus MaxMemPerNode would not be exceeded,
so it lets the job go through.
[In my case I guess the confusion comes from the fact that there is only
a node in this "cluster", so I see the four cases as basically
identical, with regards to Memory limits].
In any case, if the fourth job above gets past the submission phase, I
would think more reasonable that it never actually runs (in my system),
because allocating it to run in a single node is like going back to
submission #2
Cheers,
-- 
Ángel de Vicente  
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)

2025

2024

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting