[slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

John DeSantis desantis at usf.edu
Fri Jan 12 11:58:38 MST 2018


Ciao Alessandro,

> Do we have to apply any particular setting to avoid incurring the
> problem? 

What is your "MessageTimeout" value in slurm.conf?  If it's at the
default of 10, try changing it to 20.

I'd also check and see if the slurmctld log is reporting anything
pertaining to the server thread count being over its limit.

HTH,
John DeSantis

On Fri, 12 Jan 2018 11:32:57 +0100
Alessandro Federico <a.federico at cineca.it> wrote:

> Hi all, 
> 
> 
> we are setting up SLURM 17.11.2 on a small test cluster of about 100
> nodes. Sometimes we get the error in the subject when running any
> SLURM command (e.g. sinfo, squeue, scontrol reconf, etc...) 
> 
> 
> Do we have to apply any particular setting to avoid incurring the
> problem? 
> 
> 
> We found this bug report
> https://bugs.schedmd.com/show_bug.cgi?id=4002 but it regards the
> previous SLURM version and we do not set debug3 on slurmctld. 
> 
> 
> thanks in advance 
> ale 
> 




More information about the slurm-users mailing list