[slurm-users] Not able to allocate all 24 ntasks-per-node; slurm.conf appears correct
Anne M. Hammond
hammond at txcorp.com
Wed Mar 27 23:35:31 UTC 2019
Thanks. A second user cannot allocate any tasks on the node
which is running 12 processes.
So it does look like it slurm is tieing processes to physical cores.
Further interesting is that top shows all 24 Cpus at ~95%:
%Cpu0 : 93.4 us, 0.7 sy, 0.0 ni, 6.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 94.4 us, 0.3 sy, 0.0 ni, 5.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 95.3 us, 1.0 sy, 0.0 ni, 3.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 93.7 us, 0.7 sy, 0.0 ni, 5.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 95.7 us, 0.7 sy, 0.0 ni, 3.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 94.4 us, 1.0 sy, 0.0 ni, 4.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 94.0 us, 0.7 sy, 0.0 ni, 5.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 96.3 us, 0.3 sy, 0.0 ni, 3.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 92.7 us, 0.3 sy, 0.0 ni, 7.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 93.4 us, 0.7 sy, 0.0 ni, 6.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 94.7 us, 1.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 95.4 us, 1.0 sy, 0.0 ni, 3.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 95.7 us, 0.3 sy, 0.0 ni, 4.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 93.7 us, 0.7 sy, 0.0 ni, 5.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 94.0 us, 0.3 sy, 0.0 ni, 5.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 93.7 us, 0.7 sy, 0.0 ni, 5.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 93.4 us, 0.3 sy, 0.0 ni, 6.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 94.4 us, 0.7 sy, 0.0 ni, 5.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 95.7 us, 0.7 sy, 0.0 ni, 3.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 94.0 us, 1.0 sy, 0.0 ni, 5.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 93.7 us, 0.7 sy, 0.0 ni, 5.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 95.0 us, 0.7 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 94.7 us, 0.7 sy, 0.0 ni, 4.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 95.3 us, 1.0 sy, 0.0 ni, 3.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
and %CPU is ~195 for each of the 12 running processes.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27166 hammond 20 0 868296 83692 51544 R 196.0 0.2 304:20.39 executable
27155 hammond 20 0 878744 93924 51976 R 195.0 0.2 299:55.16 executable
27162 hammond 20 0 877072 72800 51244 R 192.4 0.1 302:06.48 executable
27156 hammond 20 0 877220 72724 51244 R 192.0 0.1 300:39.33 executable
27163 hammond 20 0 877844 77564 51588 R 191.4 0.2 302:25.27 executable
27160 hammond 20 0 877040 75332 51640 R 190.7 0.2 301:38.85 executable
27164 hammond 20 0 876808 76060 51628 R 190.7 0.2 303:07.78 executable
27159 hammond 20 0 878240 74220 51872 R 190.4 0.2 301:33.03 executable
27165 hammond 20 0 876860 75300 51540 R 190.4 0.2 303:27.36 executable
27161 hammond 20 0 877388 76116 51712 S 190.0 0.2 302:00.60 executable
27158 hammond 20 0 877072 72576 51304 R 188.7 0.1 301:44.78 executable
27157 hammond 20 0 877388 72992 51684 R 187.7 0.1 301:34.32 executable
CentOS Linux release 7.6.1810 (Core)
I will look into physical cores vs threads.
Thanks again,
More information about the slurm-users
mailing list