[slurm-users] Use all cores with HT node
Jeffrey Frey
frey at udel.edu
Fri Dec 7 07:02:49 MST 2018
I ran into this myself. By default Slurm allocates HT's as pairs (associated with a single core). The only adequate way I figured out to force HT = core is to make them full-fledged cores in the config:
NodeName=csk007 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=40 ThreadsPerCore=1 RealMemory=385630 TmpDisk=217043
and then I make sure those nodes have a "HT" feature on them to remind me they're configured with HT enabled -- also lets users request nodes with or without "HT" feature.
> On Dec 7, 2018, at 6:12 AM, Sidiney Crescencio <sidiney.crescencio at clustervision.com> wrote:
>
> Hello All,
>
> I'm facing some issues to use the HT on my compute nodes, I'm running slurm 17.02.7
>
> SelectTypeParameters = CR_CORE_MEMORY
>
> cgroup.conf
>
> CgroupAutomount=yes
> CgroupReleaseAgentDir="/etc/slurm/cgroup"
>
> # cpuset subsystem
> ConstrainCores=yes
> TaskAffinity=no
>
> # memory subsystem
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
>
> # device subsystem
> ConstrainDevices=yes
>
> If I try to allocate the 80 CPUs it will not work, I couldn't find why this doesn't work. Do you guys have any ideas or could cause this issue? I've been playing with several different parameters on the node definition, also using --threads-per-core, etc.. but still I should be able to allocate the 80 cpus.
>
> Thanks in advance.
>
> srun --reservation=test_ht -p defq -n 80 sleep 100
> srun: error: Unable to allocate resources: Requested node configuration is not available
>
> --------------
>
> [root at csk007 ~]# slurmd -C
> NodeName=csk007 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=385630 TmpDisk=217043
> UpTime=84-00:36:44
> [root at csk007 ~]# scontrol show node csk007
> NodeName=csk007 Arch=x86_64 CoresPerSocket=20
> CPUAlloc=0 CPUErr=0 CPUTot=80 CPULoad=4.03
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=(null)
> NodeAddr=csk007 NodeHostName=csk007 Version=17.02
> OS=Linux RealMemory=380000 AllocMem=0 FreeMem=338487 Sockets=2 Boards=1
> State=RESERVED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
> Partitions=defq
> BootTime=2018-09-14T12:31:05 SlurmdStartTime=2018-11-29T15:25:03
> CfgTRES=cpu=80,mem=380000M
> AllocTRES=
> CapWatts=n/a
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> -----------------------
>
> --
> Best Regards,
> Sidiney
::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE 19716
Office: (302) 831-6034 Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181207/ce9c2567/attachment-0001.html>
More information about the slurm-users
mailing list