[slurm-users] Intel MPI issue with slurm sbatch
Joe Teumer
joe.teumer at gmail.com
Tue Aug 16 16:09:43 UTC 2022
Hello!
Is there a way to turn off slurm MPI hooks?
A job submitted via sbatch executes Intel MPI and the thread affinity
settings are incorrect.
However, running MPI manually over SSH works and all bindings are correct.
We are looking to run our MPI jobs via slurm sbatch and have the same
behavior as running the job manually over SSH.
slurmd -V
slurm 22.05.3
RUNNING OMP_NUM_THREADS=, cmd=numactl -C 0-63,128-191 -m 0 mpirun -verbose
-genv I_MPI_DEBUG=4 -genv KMP_AFFINITY=verbose,granularity=fine,compact -np
64 -ppn 64 ./mpiprogram -in in.program -log program -pk intel 0 omp 2 -sf
intel -screen none -v d 1
which mpirun
/opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/mpirun
slurm sbatch:
[mpiexec at node] *Launch arguments: /usr/local/bin/srun -N 1 -n 1
--ntasks-per-node 1 --nodelist node --input none
/opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_bstrap_proxy*
--upstream-host
node --upstream-port 45427 --pgid 0 --launcher slurm --launcher-number 1
--base-path /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/
--tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug
/opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_pmi_proxy
--usize -1 --auto-cleanup 1 --abort-signal 9
SSH manual run:
[mpiexec at node] Launch arguments:
*/opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_bstrap_proxy*
--upstream-host
node --upstream-port 35747 --pgid 0 --launcher ssh --launcher-number 0
--base-path /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/
--tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug
--proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7
/opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_pmi_proxy
--usize -1 --auto-cleanup 1 --abort-signal 9
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220816/1b886406/attachment.htm>
More information about the slurm-users
mailing list