[slurm-users] Use all cores with HT node

Jeffrey Frey frey at udel.edu
Fri Dec 7 07:02:49 MST 2018


I ran into this myself.  By default Slurm allocates HT's as pairs (associated with a single core).  The only adequate way I figured out to force HT = core is to make them full-fledged cores in the config:


NodeName=csk007 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=40 ThreadsPerCore=1 RealMemory=385630 TmpDisk=217043


and then I make sure those nodes have a "HT" feature on them to remind me they're configured with HT enabled -- also lets users request nodes with or without "HT" feature.





> On Dec 7, 2018, at 6:12 AM, Sidiney Crescencio <sidiney.crescencio at clustervision.com> wrote:
> 
> Hello All,
> 
> I'm facing some issues to use the HT on my compute nodes, I'm running slurm 17.02.7
> 
> SelectTypeParameters    = CR_CORE_MEMORY
> 
> cgroup.conf
> 
> CgroupAutomount=yes
> CgroupReleaseAgentDir="/etc/slurm/cgroup"
> 
> # cpuset subsystem
> ConstrainCores=yes
> TaskAffinity=no
> 
> # memory subsystem
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> 
> # device subsystem
> ConstrainDevices=yes
> 
> If I try to allocate the 80 CPUs it will not work, I couldn't find why this doesn't work. Do you guys have any ideas or could cause this issue? I've been playing with several different parameters on the node definition, also using --threads-per-core, etc.. but still I should be able to allocate the 80 cpus.
> 
> Thanks in advance.
> 
> srun --reservation=test_ht -p defq -n 80 sleep 100
> srun: error: Unable to allocate resources: Requested node configuration is not available
> 
> --------------
> 
> [root at csk007 ~]# slurmd -C
> NodeName=csk007 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=385630 TmpDisk=217043
> UpTime=84-00:36:44
> [root at csk007 ~]# scontrol show node csk007
> NodeName=csk007 Arch=x86_64 CoresPerSocket=20
>    CPUAlloc=0 CPUErr=0 CPUTot=80 CPULoad=4.03
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=csk007 NodeHostName=csk007 Version=17.02
>    OS=Linux RealMemory=380000 AllocMem=0 FreeMem=338487 Sockets=2 Boards=1
>    State=RESERVED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=defq
>    BootTime=2018-09-14T12:31:05 SlurmdStartTime=2018-11-29T15:25:03
>    CfgTRES=cpu=80,mem=380000M
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
> 
> -----------------------
> 
> -- 
> Best Regards, 
> Sidiney


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181207/ce9c2567/attachment-0001.html>


More information about the slurm-users mailing list