[slurm-users] Slurm 1 CPU

Colas Rivière riviere at umdgrb.umd.edu
Thu Apr 4 22:46:30 UTC 2019


Hello,

Did you try adding something like that to slurm.conf?

    NodeName=cnode001 CPUs=48

Cheers,
Colas

On 2019-04-04 17:18, Chris Bateson wrote:
> I should start out by saying that I am extremely new to anything HPC.  
> Our end users purchased a 20 node cluster which a vendor set up for us 
> with Bright/Slurm.
>
> After our vendor said everything was complete and we started migrating 
> our users workflow to the new cluster they discovered that they can't 
> run more than 1 job per node at a time.  We started researching 
> enabling consumable resources which I believe we've done so however 
> we're getting the same result.
>
> I've just discovered today that both *scontrol show node* and *sinfo 
> -lNe* show that each of our nodes have 1 CPU.  I'm guessing that's why 
> we can't submit more than 1 job at a time.  I'm trying to determine 
> where is it getting this information and how can I get it to display 
> the correct CPU information.
>
> Sample info:
>
> *scontrol show node*
>
>     NodeName=cnode001 Arch=x86_64 CoresPerSocket=1
>        CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.01
>        AvailableFeatures=(null)
>        ActiveFeatures=(null)
>        Gres=(null)
>        NodeAddr=cnode001 NodeHostName=cnode001 Version=17.11
>        OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017
>        RealMemory=192080 AllocMem=0 FreeMem=188798 Sockets=1 Boards=1
>        State=IDLE ThreadsPerCore=1 TmpDisk=2038 Weight=1 Owner=N/A
>     MCS_label=N/A
>        Partitions=defq
>        BootTime=2019-03-26T14:28:24 SlurmdStartTime=2019-03-26T14:29:55
>        CfgTRES=cpu=1,mem=192080M,billing=1
>        AllocTRES=
>        CapWatts=n/a
>        CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>        ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> *sinfo -lNe*
>
>     NODELIST   NODES PARTITION       STATE CPUS S:C:T MEMORY TMP_DISK
>     WEIGHT AVAIL_FE REASON
>     cnode001       1     defq*        idle    1 1:1:1 192080     2038 
>         1   (null) none
>
>
> *lscpu*
>
>     Architecture:          x86_64
>     CPU op-mode(s):        32-bit, 64-bit
>     Byte Order:            Little Endian
>     CPU(s):                48
>     On-line CPU(s) list:   0-47
>     Thread(s) per core:    1
>     Core(s) per socket:    24
>     Socket(s):             2
>     NUMA node(s):          2
>     Vendor ID:             GenuineIntel
>     CPU family:            6
>     Model:                 85
>     Model name:            Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
>     Stepping:              4
>     CPU MHz:               2700.000
>     BogoMIPS:              5400.00
>     Virtualization:        VT-x
>     L1d cache:             32K
>     L1i cache:             32K
>     L2 cache:              1024K
>     L3 cache:              33792K
>     NUMA node0 CPU(s):
>      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
>     NUMA node1 CPU(s):
>      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
>     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>     mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
>     ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
>     arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
>     aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
>     est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>     movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
>     abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi
>     flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
>     erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
>     clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
>     cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat
>     pln pts
>
>
> *slrum.conf SelectType Configuration*
>
>     SelectType=select/cons_res
>     SelectTypeParameters=CR_Core_Memory
>     PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL
>     PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO
>     Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO
>     AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO
>     OverSubscribe=YES OverTimeLimit=0 State=UP Nodes=cnode[001-020]
>
>
>
> I can provide other configs if you feel that it could help.
>
> Any ideas?  I would have thought that slurm would grab the CPU 
> information from the CPU instead of the configuration.
>
> Thanks
> Chris
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190404/e03ada0c/attachment-0001.html>


More information about the slurm-users mailing list