[slurm-users] Bug? sbatch not respecting MaxMemPerNode setting

4 Sep 2024


      Hello,
we found an issue with Slurm 24.05.1 and the MaxMemPerNode
setting. Slurm is installed in a single workstation, and thus, the
number of nodes is just 1.
The relevant sections in slurm.conf read:
,----
| EnforcePartLimits=ALL
| PartitionName=short       Nodes=..... State=UP Default=YES MaxTime=2-00:00:00  MaxCPUsPerNode=76  MaxMemPerNode=231000 OverSubscribe=FORCE:1
`----
Now, if I submit a job requesting 76 CPUs and each one needing 4000M
(for a total of 304000M), Slurm does indeed respect the MaxMemPerNode
setting and the job is not submitted in the following cases ("-N 1" is
not really necessary, as there is only one node):
,----
| $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
| 
| $ sbatch -N 1 -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
| 
| $ sbatch -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not available
`----
But with this submission Slurm is happy:
,----
| $ sbatch -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| Submitted batch job 133982
`----
and the slurmjobcomp.log file does indeed tell me that the memory went
above MaxMemPerNode:
,----
| JobId=133982 UserId=......(10487) GroupId=domain users(2000) Name=test JobState=CANCELLED Partition=short TimeLimit=45 StartTime=2024-09-04T09:11:17 EndTime=2024-09-04T09:11:24 NodeList=...... NodeCnt=1 ProcCnt=76 WorkDir=/tmp/. ReservationName= Tres=cpu=76,mem=304000M,node=1,billing=76 Account=ddgroup QOS=domino WcKey= Cluster=...... SubmitTime=2024-09-04T09:11:17 EligibleTime=2024-09-04T09:11:17 DerivedExitCode=0:0 ExitCode=0:0
`----
What is the best way to report issues like this to the Slurm developers?
I thought of adding it to https://support.schedmd.com/ but it is not
clear to me if that page is only meant for Slurm users with a Support
Contract?
Cheers,
-- 
Ángel de Vicente  
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)

2025

2024

[slurm-users] Bug? sbatch not respecting MaxMemPerNode setting