[slurm-users] Slurm 1 CPU
Colas Rivière
riviere at umdgrb.umd.edu
Thu Apr 4 22:46:30 UTC 2019
Hello,
Did you try adding something like that to slurm.conf?
NodeName=cnode001 CPUs=48
Cheers,
Colas
On 2019-04-04 17:18, Chris Bateson wrote:
> I should start out by saying that I am extremely new to anything HPC.
> Our end users purchased a 20 node cluster which a vendor set up for us
> with Bright/Slurm.
>
> After our vendor said everything was complete and we started migrating
> our users workflow to the new cluster they discovered that they can't
> run more than 1 job per node at a time. We started researching
> enabling consumable resources which I believe we've done so however
> we're getting the same result.
>
> I've just discovered today that both *scontrol show node* and *sinfo
> -lNe* show that each of our nodes have 1 CPU. I'm guessing that's why
> we can't submit more than 1 job at a time. I'm trying to determine
> where is it getting this information and how can I get it to display
> the correct CPU information.
>
> Sample info:
>
> *scontrol show node*
>
> NodeName=cnode001 Arch=x86_64 CoresPerSocket=1
> CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.01
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=(null)
> NodeAddr=cnode001 NodeHostName=cnode001 Version=17.11
> OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017
> RealMemory=192080 AllocMem=0 FreeMem=188798 Sockets=1 Boards=1
> State=IDLE ThreadsPerCore=1 TmpDisk=2038 Weight=1 Owner=N/A
> MCS_label=N/A
> Partitions=defq
> BootTime=2019-03-26T14:28:24 SlurmdStartTime=2019-03-26T14:29:55
> CfgTRES=cpu=1,mem=192080M,billing=1
> AllocTRES=
> CapWatts=n/a
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> *sinfo -lNe*
>
> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
> WEIGHT AVAIL_FE REASON
> cnode001 1 defq* idle 1 1:1:1 192080 2038
> 1 (null) none
>
>
> *lscpu*
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 48
> On-line CPU(s) list: 0-47
> Thread(s) per core: 1
> Core(s) per socket: 24
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 85
> Model name: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
> Stepping: 4
> CPU MHz: 2700.000
> BogoMIPS: 5400.00
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 1024K
> L3 cache: 33792K
> NUMA node0 CPU(s):
> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
> NUMA node1 CPU(s):
> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
> arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
> aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
> abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi
> flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
> clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat
> pln pts
>
>
> *slrum.conf SelectType Configuration*
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO
> Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO
> OverSubscribe=YES OverTimeLimit=0 State=UP Nodes=cnode[001-020]
>
>
>
> I can provide other configs if you feel that it could help.
>
> Any ideas? I would have thought that slurm would grab the CPU
> information from the CPU instead of the configuration.
>
> Thanks
> Chris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190404/e03ada0c/attachment-0001.html>
More information about the slurm-users
mailing list