[slurm-users] srun : Communication connection failure
Durai Arasan
arasan.durai at gmail.com
Thu Jan 20 14:40:33 UTC 2022
Hello Slurm users,
We are suddenly encountering strange errors while trying to launch
interactive jobs on our cpu partitions. Have you encountered this problem
before? Kindly let us know.
[darasan84 at bg-slurmb-login1 ~]$ srun --job-name "admin_test231" --ntasks=1
--nodes=1 --cpus-per-task=1 --partition=cpu-short --mem=1G
--nodelist=slurm-cpu-hm-7 --time 1:00:00 --pty bash
srun: error: Task launch for StepId=1137134.0 failed on node
slurm-cpu-hm-7: Communication connection failure
srun: error: Application launch failed: Communication connection failure
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
Best regards,
Durai Arasan
MPI Tuebingen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220120/0e95c338/attachment.htm>
More information about the slurm-users
mailing list