[slurm-users] salloc: error: Error on msg accept socket: Too many open files

Andy Riebs andy at candooz.com
Tue Feb 2 19:32:23 UTC 2021


Run salloc with a smaller number of nodes or tasks, then take a look at 
lsof (or some other favorite means of finding IP connections). IIRC, 
each srun/node in the allocation needs 70-80 IP connections with the 
node running salloc, so a large node count can overwhelm the default 
allocation of file descriptors.

On 2/2/2021 1:14 PM, Patrick Goetz wrote:
> That sounds like a linux issue. You probably need to reset the max 
> limit for file descriptors someplace.
>
> Maybe start here:
>  https://rtcamp.com/tutorials/linux/increase-open-files-limit/
>
> On 2/2/21 11:50 AM, Prentice Bisbal wrote:
>> Has anyone seen this error message before? A user just reported it. A 
>> Google search doesn't turn up anything useful. I mean, I understand 
>> what too many open files means, but I'm surprised to see it in the 
>> context of salloc.
>>
>> salloc: error: Error on msg accept socket: Too many open files
>>
>



More information about the slurm-users mailing list