[slurm-users] Question about memory allocation
Mahmood Naderan
mahmood.nt at gmail.com
Tue Dec 17 09:03:06 UTC 2019
Please see the latest update
# for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory;
done && scontrol show node hpc | grep RealMemory
RealMemory=64259 AllocMem=1024 FreeMem=57163 Sockets=32 Boards=1
RealMemory=120705 AllocMem=1024 FreeMem=97287 Sockets=32 Boards=1
RealMemory=64259 AllocMem=1024 FreeMem=40045 Sockets=32 Boards=1
RealMemory=64259 AllocMem=1024 FreeMem=24154 Sockets=10 Boards=1
$ sbatch slurm_qe.sh
Submitted batch job 125
$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
125 SEA qe-fb mahmood PD 0:00 4
(Resources)
124 SEA U1phi1 abspou R 3:52 4
compute-0-[0-2],hpc
$ scontrol show -d job 125
JobId=125 JobName=qe-fb
UserId=mahmood(1000) GroupId=mahmood(1000) MCS_label=N/A
Priority=1751 Nice=0 Account=fish QOS=normal WCKey=*default
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:00 TimeLimit=30-00:00:00 TimeMin=N/A
SubmitTime=2019-12-17T12:29:08 EligibleTime=2019-12-17T12:29:08
AccrueTime=2019-12-17T12:29:08
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:29:09
Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:22742
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=4-4 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=20,mem=40G,node=4,billing=20
Socks/Node=* NtasksPerN:B:S:C=5:0:*:* CoreSpec=*
MinCPUsNode=5 MinMemoryNode=10G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/mahmood/qe/f_borophene/slurm_qe.sh
WorkDir=/home/mahmood/qe/f_borophene
StdErr=/home/mahmood/qe/f_borophene/my_fb.log
StdIn=/dev/null
StdOut=/home/mahmood/qe/f_borophene/my_fb.log
Power=
$ cat slurm_qe.sh
#!/bin/bash
#SBATCH --job-name=qe-fb
#SBATCH --output=my_fb.log
#SBATCH --partition=SEA
#SBATCH --account=fish
#SBATCH --mem=10GB
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=5
mpirun -np $SLURM_NTASKS /share/apps/q-e-qe-6.5/bin/pw.x -in
f_borophene_scf.in
You can also see the job detail of 124
$ scontrol show -d job 124
JobId=124 JobName=U1phi1
UserId= abspou(1002) GroupId= abspou(1002) MCS_label=N/A
Priority=958 Nice=0 Account=fish QOS=normal WCKey=*default
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:06:17 TimeLimit=30-00:00:00 TimeMin=N/A
SubmitTime=2019-12-17T12:25:17 EligibleTime=2019-12-17T12:25:17
AccrueTime=2019-12-17T12:25:17
StartTime=2019-12-17T12:25:17 EndTime=2020-01-16T12:25:17 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:25:17
Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:20085
ReqNodeList=(null) ExcNodeList=(null)
NodeList=compute-0-[0-2],hpc
BatchHost=compute-0-0
NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=24,mem=4G,node=4,billing=24
Socks/Node=* NtasksPerN:B:S:C=6:0:*:* CoreSpec=*
Nodes=compute-0-[0-2],hpc CPU_IDs=0-5 Mem=1024 GRES=
MinCPUsNode=6 MinMemoryNode=1G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/slurm_script.sh
WorkDir=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1
StdErr=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
StdIn=/dev/null
StdOut=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
Power=
I can not figure out what is the root of the problem.
Regards,
Mahmood
On Tue, Dec 17, 2019 at 11:18 AM Marcus Wagner <wagner at itc.rwth-aachen.de>
wrote:
> Dear Mahmood,
>
> could you please show the output of
>
> scontrol show -d job 119
>
> Best
> Marcus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191217/fd0d8a1d/attachment.htm>
More information about the slurm-users
mailing list