[slurm-users] Jobs stuck in "completing" (CG) state

Chris Samuel chris at csamuel.org
Sat Oct 24 18:19:43 UTC 2020


On 10/24/20 9:22 am, Kimera Rodgers wrote:

> [root at kla-ac-ohpc-01 critical]# srun -c 8 --pty bash -i
> srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
> srun: error: Task launch for 37.0 failed on node c-node3: Socket timed 
> out on send/recv operation
> srun: error: Application launch failed: Socket timed out on send/recv 
> operation
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

To me this looks like networking issues, perhaps firewall/iptables rules 
blocking connections.

Best of luck,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list