[slurm-users] slurm_receive_msg: Insane message length
hollowec at bnl.gov
Tue Jun 7 20:04:48 UTC 2022
We have a 27-node cluster that is subject to Nessus security scanning. During scans, we are frequently seeing jobs terminate with the following error:
srun: error: eio_message_socket_accept: slurm_receive_msg[10.10.10.1:33072]: Insane message length
srun: error: eio_message_socket_accept: slurm_receive_msg[10.10.10.1:54614]: Insane message length
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
To me, this looks like an issue with the Slurm pmi2 component during the scan. Have others seen this? Are there ways to avoid this other than disabling the scans, or blocking the scans from certain ports with iptables? 10.10.10.1 in the logs is the scanner's IP. We are running Slurm 20.11.9.
Scientific Data and Computing Center
Brookhaven National Laboratory
More information about the slurm-users