[slurm-users] Running pyMPI on several nodes

Pär Lundö par.lundo at foi.se
Fri Jul 12 06:04:21 UTC 2019

Hi there Slurm-experts!
I am  trouble using or running a python-mpi program involving more than one node. The pythom-mpi program is very simple, it only lists the number of ranks that is available in its environment. I have a munge-daemon running prior to starting the slurm-service and the program works when using a single node (so I suppose munge is working).
In addition, I have tested to run a simple sbatch-script where each available node (four nodes) states its hostname and returns.
Since authentication with Slurm is used via munge, do I need a passwordless SSH communication between the slurmctl and the nodes? (I found a guide,probably outdated stating that passwordless SSH communication is a neccessity for slurm, HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).

I run the python-mpi program via a sbatch-script,invoking a srun-command. Each node has 8 CPUs.
The srun-command is :
”srun -N2 -n8 python3 python-mpi.py” ,
when tested on two nodes.
It works fine running on a single node(with ”-N1” instead of ”-N2”), but it is aborted or stopped when running on two nodes.
Should I have ”-n16” when running on two nodes? (In order to allocate the complete number of CPUs available of the two nodes.)
Slurm is configured and built with pmix.
I am running Slurm 19.05 on Ubuntu 18.04 as server and the nodes are running same slurm-version on Ubuntu 18.10.

Best rehards,

