[slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

Alessandro Federico a.federico at cineca.it
Mon Jan 15 05:33:15 MST 2018


Hi John

thanks for the info. 
slurmctld doesn't report anything about the server thread count in the logs
and sdiag show only 3 server threads.

We changed the MessageTimeout value to 20.

I'll let you know if it solves the problem.

Thanks
ale

----- Original Message -----
> From: "John DeSantis" <desantis at usf.edu>
> To: "Alessandro Federico" <a.federico at cineca.it>
> Cc: slurm-users at lists.schedmd.com, "Isabella Baccarelli" <i.baccarelli at cineca.it>, hpc-sysmgt-info at cineca.it
> Sent: Friday, January 12, 2018 7:58:38 PM
> Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation
> 
> Ciao Alessandro,
> 
> > Do we have to apply any particular setting to avoid incurring the
> > problem?
> 
> What is your "MessageTimeout" value in slurm.conf?  If it's at the
> default of 10, try changing it to 20.
> 
> I'd also check and see if the slurmctld log is reporting anything
> pertaining to the server thread count being over its limit.
> 
> HTH,
> John DeSantis
> 
> On Fri, 12 Jan 2018 11:32:57 +0100
> Alessandro Federico <a.federico at cineca.it> wrote:
> 
> > Hi all,
> > 
> > 
> > we are setting up SLURM 17.11.2 on a small test cluster of about
> > 100
> > nodes. Sometimes we get the error in the subject when running any
> > SLURM command (e.g. sinfo, squeue, scontrol reconf, etc...)
> > 
> > 
> > Do we have to apply any particular setting to avoid incurring the
> > problem?
> > 
> > 
> > We found this bug report
> > https://bugs.schedmd.com/show_bug.cgi?id=4002 but it regards the
> > previous SLURM version and we do not set debug3 on slurmctld.
> > 
> > 
> > thanks in advance
> > ale
> > 
> 
> 

-- 
Alessandro Federico 
HPC System Management Group 
System & Technology Department 
CINECA www.cineca.it 
Via dei Tizii 6, 00185 Rome - Italy 
phone: +39 06 44486708 

All work and no play makes Jack a dull boy. 
All work and no play makes Jack a dull boy. 
All work and no play makes Jack...



More information about the slurm-users mailing list