[slurm-users] sstat -a: Socket timed out on send/recv operation

Angel de Vicente angel.de.vicente at iac.es
Tue Jul 11 21:07:36 UTC 2023


Hello,

trying to get some stats about a running job, I've realized that one of
the jobs is consistently failing with: 

,----
| sstat: error: slurm_receive_msgs: [[----]:6818] failed: Socket timed out on send/recv operation
| sstat: error: slurm_job_step_stat: unknown return given from ----.ll.iac.es: 9001 rc = Communication connection failure
| sstat: error: problem getting step_layout for StepId=249974.batch: Communication connection failure
`----

Running "sstat" against the other running jobs is not a problem, though
the time it takes to get the results varies a lot from one job to
another.

Is there some timeout variable that I can modify to allow more time for
the sstat command to finish?

Cheers,
-- 
Ángel de Vicente
 Research Software Engineer (Supercomputing and BigData)
 Tel.: +34 922-605-747
 Web.: http://research.iac.es/proyecto/polmag/

 GPG: 0x8BDC390B69033F52
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5877 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230711/3b875cd4/attachment.bin>


More information about the slurm-users mailing list