[slurm-users] getting closer

Valerio Bellizzomi valerio at selnet.org
Fri Jun 28 07:57:07 UTC 2019


On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
> On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
> > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
> >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> >>> The nodes are now communicating however when I run the command
> >>>
> >>> srun -w compute02 /bin/ls
> >>>
> >>> it remains stuck and there is no output on the submit machine.
> >>>
> >>> on the compute02 there is a Communication error and Timeout.
> >>>
> >>> the network ports 6817 and 6818 are open.
> >>
> >>
> >> Looking at the firewall logs, slurmctld wants to connect back to a range
> >> of ports which are closed.
> > 
> > 
> > As a test I stopped the firewall service on the submit machine, now the
> > command above is working fine.
> 
> You may want to check your firewall settings according to Slurm's 
> requirements.  I've summarized this in my Wiki page:
> 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
> 
> 
> /Ole
> 

I am using another system and another firewall.

Also the Gres definition is different:

Generic resources (GRES) and GPUs for Debian with ROCm:
Name=gpus Type=vega File=/dev/dri/card[n]






More information about the slurm-users mailing list