[slurm-users] Running pyMPI on several nodes
Mark Hahn
hahn at mcmaster.ca
Fri Jul 12 13:53:07 UTC 2019
> I am trouble using or running a python-mpi program involving more than one
> node. The pythom-mpi program is very simple,
do you think there's something unique about the python program?
(also, you mean mpi4py, right?)
> Since authentication with Slurm is used via munge, do I need a passwordless
>SSH communication between the slurmctl and the nodes? (I found a
no, you don't need it. the combination of slurmd (the actual spawning)
and munge (for credentials/authentication) is how slurmctld starts jobs.
>guide,probably outdated stating that passwordless SSH communication is a
>neccessity for slurm,
>HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).
I suspect that's an editing escape: you do usually want mutual access among
user-accessible nodes (login, compute, but not usually admin things like
slurmctld or slurmdb nodes).
> ?srun -N2 -n8 python3 python-mpi.py? ,
using srun does not depend on ssh. if you use mpirun/mpiexec, it *might*
depend on ssh (but only among the compute nodes).
> It works fine running on a single node(with ?-N1? instead of ?-N2?), but it is aborted or stopped when running on two nodes.
I would guess you need to look at slurmd logs on the nodes.
regards, mark hahn.
More information about the slurm-users
mailing list