[slurm-users] Inconsistent cpu bindings with cpu-bind=none
Marcus Boden
mboden at gwdg.de
Mon Feb 17 08:48:35 UTC 2020
Hi everyone,
I am facing a bit of a weird issue with CPU bindings and mpirun:
My jobscript:
#SBATCH -N 20
#SBATCH --tasks-per-node=40
#SBATCH -p medium40
#SBATCH -t 30
#SBATCH -o out/%J.out
#SBATCH -e out/%J.err
#SBATCH --reservation=root_98
module load impi/2019.4 2>&1
export I_MPI_DEBUG=6
export SLURM_CPU_BIND=none
. /sw/comm/impi/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpivars.sh realease
BENCH=/sw/comm/impi/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1
mpirun -np 800 $BENCH -npmin 800 -iter 50 -time 120 -msglog 16:18 -include Allreduce Bcast Barrier Exchange Gather PingPing PingPong Reduce Scatter Allgather Alltoall Reduce_scatter
My output is as follows:
[...]
[0] MPI startup(): 37 154426 gcn1311 {37,77}
[0] MPI startup(): 38 154427 gcn1311 {38,78}
[0] MPI startup(): 39 154428 gcn1311 {39,79}
[0] MPI startup(): 40 161061 gcn1312 {0}
[0] MPI startup(): 41 161062 gcn1312 {40}
[0] MPI startup(): 42 161063 gcn1312 {0}
[0] MPI startup(): 43 161064 gcn1312 {40}
[0] MPI startup(): 44 161065 gcn1312 {0}
[...]
On 8 out of 20 nodes I got the wrong pinning. In the slurmd logs I found
that on nodes, where the pinning was correct, manual binding was
communicated correctly:
lllp_distribution jobid [2065227] manual binding: none
On those, where it did not work, not so much:
lllp_distribution jobid [2065227] default auto binding: cores, dist 1
So, for some reason, slurm told some task to use CPU bindings and for
some, the cpu binding was (correctly) disabled.
Any ideas what could cause this?
Best,
Marcus
--
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.: +49 (0)551 201-2191
E-Mail: mboden at gwdg.de
---------------------------------------
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL: http://www.gwdg.de
E-Mail: gwdg at gwdg.de
Tel.: +49 (0)551 201-1510
Fax: +49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5028 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200217/a2568f28/attachment.bin>
More information about the slurm-users
mailing list