[slurm-users] Running pyMPI on several nodes

John Hearns hearnsj at googlemail.com
Fri Jul 12 06:45:26 UTC 2019


Please try something very simple such as a hello world program or
srun -N2 -n8 hostname

What is the error message which you have ?

On Fri, 12 Jul 2019 at 07:07, Pär Lundö <par.lundo at foi.se> wrote:

>
> Hi there Slurm-experts!
> I am  trouble using or running a python-mpi program involving more than
> one node. The pythom-mpi program is very simple, it only lists the number
> of ranks that is available in its environment. I have a munge-daemon
> running prior to starting the slurm-service and the program works when
> using a single node (so I suppose munge is working).
> In addition, I have tested to run a simple sbatch-script where each
> available node (four nodes) states its hostname and returns.
> Since authentication with Slurm is used via munge, do I need a
> passwordless SSH communication between the slurmctl and the nodes? (I found
> a guide,probably outdated stating that passwordless SSH communication is a
> neccessity for slurm,
> HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).
>
> I run the python-mpi program via a sbatch-script,invoking a srun-command.
> Each node has 8 CPUs.
> The srun-command is :
> ”srun -N2 -n8 python3 python-mpi.py” ,
> when tested on two nodes.
> It works fine running on a single node(with ”-N1” instead of ”-N2”), but
> it is aborted or stopped when running on two nodes.
> Should I have ”-n16” when running on two nodes? (In order to allocate the
> complete number of CPUs available of the two nodes.)
> Slurm is configured and built with pmix.
> I am running Slurm 19.05 on Ubuntu 18.04 as server and the nodes are
> running same slurm-version on Ubuntu 18.10.
>
> Best rehards,
>
> Palle
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190712/781fcee8/attachment.htm>


More information about the slurm-users mailing list