[slurm-users] Strange error, submission denied

Marcus Wagner wagner at itc.rwth-aachen.de
Thu Feb 14 07:27:00 UTC 2019


Hi Chris,


this are 96 thread nodes with 48 cores. You are right, that if we set it 
to 24, the job will get scheduled. But then, only half of the node is 
used. On the other side, if I only use --ntasks=48, slurm schedules all 
tasks onto the same node. The hyperthread of each core is included in 
the cgroup and the task_affinity plugin also correctly binds the 
hyperthread together with the core (small ugly testscript from us, the 
last two numbers are the core and its hyperthread):

ncm0728.hpc.itc.rwth-aachen.de <0> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 0,48
ncm0728.hpc.itc.rwth-aachen.de <10> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 26,74
ncm0728.hpc.itc.rwth-aachen.de <11> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 29,77
ncm0728.hpc.itc.rwth-aachen.de <12> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 6,54
ncm0728.hpc.itc.rwth-aachen.de <13> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 9,57
ncm0728.hpc.itc.rwth-aachen.de <14> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 30,78
ncm0728.hpc.itc.rwth-aachen.de <15> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 33,81
ncm0728.hpc.itc.rwth-aachen.de <16> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 7,55
ncm0728.hpc.itc.rwth-aachen.de <17> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 10,58
ncm0728.hpc.itc.rwth-aachen.de <18> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 31,79
ncm0728.hpc.itc.rwth-aachen.de <19> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 34,82
ncm0728.hpc.itc.rwth-aachen.de <1> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 3,51
ncm0728.hpc.itc.rwth-aachen.de <20> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 8,56
ncm0728.hpc.itc.rwth-aachen.de <21> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 11,59
ncm0728.hpc.itc.rwth-aachen.de <22> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 32,80
ncm0728.hpc.itc.rwth-aachen.de <23> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 35,83
ncm0728.hpc.itc.rwth-aachen.de <24> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 12,60
ncm0728.hpc.itc.rwth-aachen.de <25> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 15,63
ncm0728.hpc.itc.rwth-aachen.de <26> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 36,84
ncm0728.hpc.itc.rwth-aachen.de <27> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 39,87
ncm0728.hpc.itc.rwth-aachen.de <28> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 13,61
ncm0728.hpc.itc.rwth-aachen.de <29> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 16,64
ncm0728.hpc.itc.rwth-aachen.de <2> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 24,72
ncm0728.hpc.itc.rwth-aachen.de <30> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 37,85
ncm0728.hpc.itc.rwth-aachen.de <31> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 40,88
ncm0728.hpc.itc.rwth-aachen.de <32> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 14,62
ncm0728.hpc.itc.rwth-aachen.de <33> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 17,65
ncm0728.hpc.itc.rwth-aachen.de <34> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 38,86
ncm0728.hpc.itc.rwth-aachen.de <35> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 41,89
ncm0728.hpc.itc.rwth-aachen.de <36> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 18,66
ncm0728.hpc.itc.rwth-aachen.de <37> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 21,69
ncm0728.hpc.itc.rwth-aachen.de <38> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 42,90
ncm0728.hpc.itc.rwth-aachen.de <39> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 45,93
ncm0728.hpc.itc.rwth-aachen.de <3> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 27,75
ncm0728.hpc.itc.rwth-aachen.de <40> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 19,67
ncm0728.hpc.itc.rwth-aachen.de <41> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 22,70
ncm0728.hpc.itc.rwth-aachen.de <42> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 43,91
ncm0728.hpc.itc.rwth-aachen.de <43> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 46,94
ncm0728.hpc.itc.rwth-aachen.de <44> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 20,68
ncm0728.hpc.itc.rwth-aachen.de <45> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 23,71
ncm0728.hpc.itc.rwth-aachen.de <46> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 44,92
ncm0728.hpc.itc.rwth-aachen.de <47> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 47,95
ncm0728.hpc.itc.rwth-aachen.de <4> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 1,49
ncm0728.hpc.itc.rwth-aachen.de <5> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 4,52
ncm0728.hpc.itc.rwth-aachen.de <6> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 25,73
ncm0728.hpc.itc.rwth-aachen.de <7> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 28,76
ncm0728.hpc.itc.rwth-aachen.de <8> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 2,50
ncm0728.hpc.itc.rwth-aachen.de <9> OMP_STACKSIZE: <#> unlimited+p2 
+pemap 5,53


--ntasks=48:

    NodeList=ncm0728
    BatchHost=ncm0728
    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=48,mem=182400M,node=1,billing=48


--ntasks=48
--ntasks-per-node=24:

    NodeList=ncm[0438-0439]
    BatchHost=ncm0438
    NumNodes=2 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=48,mem=182400M,node=2,billing=48


--ntasks=48
--ntasks-per-node=48:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration 
is not available


Isn't the first essentially the same as the last, with the difference, 
that I want to force slurm to put all tasks onto one node?



Best
Marcus


On 2/14/19 7:15 AM, Chris Samuel wrote:
> On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote:
>
>> #SBATCH --ntasks-per-node=48
> I wouldn't mind betting is that if you set that to 24 it will work, and each
> thread will be assigned a single core with the 2 thread units on it.
>
> All the best,
> Chris

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list