[slurm-users] Yet another issue with AssocGrpMemLimit

Mahmood Naderan mahmood.nt at gmail.com
Tue May 12 11:36:01 UTC 2020


Hi
With the following memory stats on two nodes

[root at hpc slurm]# scontrol show node compute-0-0 | grep Memory
   RealMemory=64259 AllocMem=0 FreeMem=63429 Sockets=32 Boards=1
[root at hpc slurm]# scontrol show node compute-0-1 | grep Memory
   RealMemory=120705 AllocMem=1024 FreeMem=103051 Sockets=32 Boards=1

the following script

#!/bin/bash
#SBATCH --job-name=qe
#SBATCH --output=my_fb.log
#SBATCH --partition=SEA2
#SBATCH --account=fish2
#SBATCH --mem=18GB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
mpirun -np $SLURM_NTASKS /share/apps/q-e-qe-6.5/bin/pw.x -in 5.in


fails with AssocGrpMemLimit error.
The user limit is 32G according to the following command

[root at hpc slurm]# sacctmgr list association
format=partition,account,user,grptres%30 | grep mn
       sea       fish   mn                 cpu=16,mem=32G
      sea2      fish2   mn                 cpu=16,mem=32G
                local   mn


According to squeue, there is another running job as below

[root at hpc slurm]# squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               492      SEA2       qe mn PD       0:00      2 (AssocGrpMemLimit)
               481       SEA U8Phi0.6 abb  R 2-18:57:06      3
compute-0-[1-2],hpc

The memory limit for the second user is 12G as below

[root at hpc slurm]# sacctmgr list association
format=partition,account,user,grptres%30 | grep abbas
       sea       fish  abb                  cpu=15,mem=12G
                local  abb


May I know what is exactly limiting the memory request for the use mn?



Regards,
Mahmood



More information about the slurm-users mailing list