[slurm-users] Question about memory allocation

Sean Crosby scrosby at unimelb.edu.au
Tue Dec 17 12:06:41 UTC 2019


What services did you restart after changing the slurm.conf? Did you do an scontrol reconfigure?

Do you have any reservations? scontrol show res

Sean

On Tue, 17 Dec. 2019, 10:35 pm Mahmood Naderan, <mahmood.nt at gmail.com<mailto:mahmood.nt at gmail.com>> wrote:
>Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per node). That means 6 CPUs are being used on node hpc.
>Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node). In total, if it was running, that would require 11 CPUs on node hpc. But hpc only has 10 cores, so it can't run.

Right... I changed that but still the job is in pending state.
I modified /etc/slurm/slurm.conf as below

# grep hpc /etc/slurm/slurm.conf
NodeName=hpc NodeAddr=10.1.1.1 CPUs=11


# for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory; done && scontrol show node hpc | grep RealMemory
   RealMemory=64259 AllocMem=1024 FreeMem=57116 Sockets=32 Boards=1
   RealMemory=120705 AllocMem=1024 FreeMem=66403 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=39966 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=49189 Sockets=11 Boards=1
# for i in {0..2}; do scontrol show node compute-0-$i | grep CPUTot; done && scontrol show node hpc | grep CPUTot
   CPUAlloc=6 CPUTot=32 CPULoad=5.18
   CPUAlloc=6 CPUTot=32 CPULoad=18.94
   CPUAlloc=6 CPUTot=32 CPULoad=5.41
   CPUAlloc=6 CPUTot=11 CPULoad=5.21


But still the job is pending

$ scontrol show -d job 129
JobId=129 JobName=qe-fb
   UserId=mahmood(1000) GroupId=mahmood(1000) MCS_label=N/A
   Priority=1751 Nice=0 Account=fish QOS=normal WCKey=*default
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=30-00:00:00 TimeMin=N/A
   SubmitTime=2019-12-17T15:00:37 EligibleTime=2019-12-17T15:00:37
   AccrueTime=2019-12-17T15:00:37
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T15:00:38
   Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:14534<http://hpc.scu.ac.ir:14534>
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=4-4 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=20,mem=40G,node=4,billing=20
   Socks/Node=* NtasksPerN:B:S:C=5:0:*:* CoreSpec=*
   MinCPUsNode=5 MinMemoryNode=10G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/mahmood/qe/f_borophene/slurm_qe.sh
   WorkDir=/home/mahmood/qe/f_borophene
   StdErr=/home/mahmood/qe/f_borophene/my_fb.log
   StdIn=/dev/null
   StdOut=/home/mahmood/qe/f_borophene/my_fb.log
   Power=


>I'm not aware of any nodes, that have 32, or even 10 sockets. Are you sure, you want to use the cluster like that?

Marcus,
I have installed slurm via slurm roll on Rocks. All 4 nodes are dual socket Opetron 6282 with the following specs
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2

I just wrote 11 CPUs for the head node in order to not fully utilize the head node with jobs.

For example, compute-0-0 is

$ scontrol show node compute-0-0
NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=6 CPUTot=32 CPULoad=5.15
   AvailableFeatures=rack-0,32CPUs
   ActiveFeatures=rack-0,32CPUs
   Gres=(null)
   NodeAddr=10.1.1.254 NodeHostName=compute-0-0
   OS=Linux 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 30 14:19:46 UTC 2019
   RealMemory=64259 AllocMem=1024 FreeMem=57050 Sockets=32 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511900 Owner=N/A MCS_label=N/A
   Partitions=CLUSTER,WHEEL,SEA
   BootTime=2019-10-10T19:01:38 SlurmdStartTime=2019-12-17T13:50:37
   CfgTRES=cpu=32,mem=64259M,billing=47
   AllocTRES=cpu=6,mem=1G
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


Regards,
Mahmood




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191217/ef52b8f4/attachment.htm>


More information about the slurm-users mailing list