[slurm-users] Running pyMPI on several nodes

Mark Hahn hahn at mcmaster.ca
Fri Jul 12 13:53:07 UTC 2019


> I am  trouble using or running a python-mpi program involving more than one
> node. The pythom-mpi program is very simple,

do you think there's something unique about the python program?
(also, you mean mpi4py, right?)

> Since authentication with Slurm is used via munge, do I need a passwordless
>SSH communication between the slurmctl and the nodes? (I found a

no, you don't need it.  the combination of slurmd (the actual spawning)
and munge (for credentials/authentication) is how slurmctld starts jobs.

>guide,probably outdated stating that passwordless SSH communication is a
>neccessity for slurm,
>HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).

I suspect that's an editing escape: you do usually want mutual access among
user-accessible nodes (login, compute, but not usually admin things like 
slurmctld or slurmdb nodes).

> ?srun -N2 -n8 python3 python-mpi.py? ,

using srun does not depend on ssh.  if you use mpirun/mpiexec, it *might*
depend on ssh (but only among the compute nodes).

> It works fine running on a single node(with ?-N1? instead of ?-N2?), but it is aborted or stopped when running on two nodes.

I would guess you need to look at slurmd logs on the nodes.

regards, mark hahn.



More information about the slurm-users mailing list