[slurm-users] Yet another issue with AssocGrpMemLimit
Mahmood Naderan
mahmood.nt at gmail.com
Tue May 12 11:36:01 UTC 2020
Hi
With the following memory stats on two nodes
[root at hpc slurm]# scontrol show node compute-0-0 | grep Memory
RealMemory=64259 AllocMem=0 FreeMem=63429 Sockets=32 Boards=1
[root at hpc slurm]# scontrol show node compute-0-1 | grep Memory
RealMemory=120705 AllocMem=1024 FreeMem=103051 Sockets=32 Boards=1
the following script
#!/bin/bash
#SBATCH --job-name=qe
#SBATCH --output=my_fb.log
#SBATCH --partition=SEA2
#SBATCH --account=fish2
#SBATCH --mem=18GB
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
mpirun -np $SLURM_NTASKS /share/apps/q-e-qe-6.5/bin/pw.x -in 5.in
fails with AssocGrpMemLimit error.
The user limit is 32G according to the following command
[root at hpc slurm]# sacctmgr list association
format=partition,account,user,grptres%30 | grep mn
sea fish mn cpu=16,mem=32G
sea2 fish2 mn cpu=16,mem=32G
local mn
According to squeue, there is another running job as below
[root at hpc slurm]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
492 SEA2 qe mn PD 0:00 2 (AssocGrpMemLimit)
481 SEA U8Phi0.6 abb R 2-18:57:06 3
compute-0-[1-2],hpc
The memory limit for the second user is 12G as below
[root at hpc slurm]# sacctmgr list association
format=partition,account,user,grptres%30 | grep abbas
sea fish abb cpu=15,mem=12G
local abb
May I know what is exactly limiting the memory request for the use mn?
Regards,
Mahmood
More information about the slurm-users
mailing list