[slurm-users] how to configure correctly node and memory when a script fails with out of memory

Gérard Henry (AMU) gerard.henry at univ-amu.fr
Mon Oct 30 14:46:24 UTC 2023

Hello all,

I can't configure the slurm script correctly. My program needs 100GB of 
memory, it's the only criteria. But the job always fails with an out of 
Here's the cluster configuration I'm using:


DefMemPerCPU=5770 MaxMemPerCPU=5778
for each node: CPUAlloc=32 RealMemory=190000 AllocMem=184640

my script contains:
#SBATCH --ntasks=60
#SBATCH --mem-per-cpu=1500M
#SBATCH --cpus-per-task=1
mpirun ../zsimpletest_analyse

when it fails, sacct gives the follwing information:
JobID           JobName    Elapsed      NCPUS   TotalCPU    CPUTime 
ReqMem     MaxRSS  MaxDiskRead MaxDiskWrite      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- 
---------- ---------- ------------ ------------ ---------- --------
8500578        analyse5   00:03:04         60   02:57:58   03:04:00 
90000M                                      OUT_OF_ME+    0:125
8500578.bat+      batch   00:03:04         16  46:34.302   00:49:04 
        21465736K        0.23M        0.01M OUT_OF_ME+    0:125
8500578.0         orted   00:03:05         44   02:11:24   02:15:40 
           40952K        0.42M        0.03M  COMPLETED      0:0

i don't understand why MaxRSS=21M leads to "out of memory" with 16cpus 
and 1500M per cpu (24M)

if anybody can help?

thanks in advance

Gérard HENRY
Institut Fresnel - UMR 7249
+33 413945457
Aix-Marseille Université - Campus Etoile, BATIMENT FRESNEL, Avenue 
Escadrille Normandie Niemen, 13013 Marseille
Site : https://fresnel.fr/
Afin de respecter l'environnement, merci de n'imprimer cet email que si 

More information about the slurm-users mailing list