[slurm-users] Question about memory allocation

Mahmood Naderan mahmood.nt at gmail.com
Tue Dec 17 09:03:06 UTC 2019


Please see the latest update

# for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory;
done && scontrol show node hpc | grep RealMemory
   RealMemory=64259 AllocMem=1024 FreeMem=57163 Sockets=32 Boards=1
   RealMemory=120705 AllocMem=1024 FreeMem=97287 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=40045 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=24154 Sockets=10 Boards=1



$ sbatch slurm_qe.sh
Submitted batch job 125
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               125       SEA    qe-fb  mahmood PD       0:00      4
(Resources)
               124       SEA   U1phi1 abspou     R       3:52      4
compute-0-[0-2],hpc
$ scontrol show -d job 125
JobId=125 JobName=qe-fb
   UserId=mahmood(1000) GroupId=mahmood(1000) MCS_label=N/A
   Priority=1751 Nice=0 Account=fish QOS=normal WCKey=*default
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=30-00:00:00 TimeMin=N/A
   SubmitTime=2019-12-17T12:29:08 EligibleTime=2019-12-17T12:29:08
   AccrueTime=2019-12-17T12:29:08
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:29:09
   Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:22742
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=4-4 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=20,mem=40G,node=4,billing=20
   Socks/Node=* NtasksPerN:B:S:C=5:0:*:* CoreSpec=*
   MinCPUsNode=5 MinMemoryNode=10G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/mahmood/qe/f_borophene/slurm_qe.sh
   WorkDir=/home/mahmood/qe/f_borophene
   StdErr=/home/mahmood/qe/f_borophene/my_fb.log
   StdIn=/dev/null
   StdOut=/home/mahmood/qe/f_borophene/my_fb.log
   Power=

$ cat slurm_qe.sh
#!/bin/bash
#SBATCH --job-name=qe-fb
#SBATCH --output=my_fb.log
#SBATCH --partition=SEA
#SBATCH --account=fish
#SBATCH --mem=10GB
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=5
mpirun -np $SLURM_NTASKS /share/apps/q-e-qe-6.5/bin/pw.x -in
f_borophene_scf.in




You can also see the job detail of 124


$ scontrol show -d job 124
JobId=124 JobName=U1phi1
   UserId= abspou(1002) GroupId= abspou(1002) MCS_label=N/A
   Priority=958 Nice=0 Account=fish QOS=normal WCKey=*default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:06:17 TimeLimit=30-00:00:00 TimeMin=N/A
   SubmitTime=2019-12-17T12:25:17 EligibleTime=2019-12-17T12:25:17
   AccrueTime=2019-12-17T12:25:17
   StartTime=2019-12-17T12:25:17 EndTime=2020-01-16T12:25:17 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:25:17
   Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:20085
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=compute-0-[0-2],hpc
   BatchHost=compute-0-0
   NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=24,mem=4G,node=4,billing=24
   Socks/Node=* NtasksPerN:B:S:C=6:0:*:* CoreSpec=*
     Nodes=compute-0-[0-2],hpc CPU_IDs=0-5 Mem=1024 GRES=
   MinCPUsNode=6 MinMemoryNode=1G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

 Command=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/slurm_script.sh
   WorkDir=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1

 StdErr=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
   StdIn=/dev/null

 StdOut=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
   Power=


I can not figure out what is the root of the problem.



Regards,
Mahmood




On Tue, Dec 17, 2019 at 11:18 AM Marcus Wagner <wagner at itc.rwth-aachen.de>
wrote:

> Dear Mahmood,
>
> could you please show the output of
>
> scontrol show -d job 119
>
> Best
> Marcus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191217/fd0d8a1d/attachment.htm>


More information about the slurm-users mailing list