[slurm-users] salloc: error: Error on msg accept socket: Too many open files
Andy Riebs
andy at candooz.com
Tue Feb 2 19:32:23 UTC 2021
Run salloc with a smaller number of nodes or tasks, then take a look at
lsof (or some other favorite means of finding IP connections). IIRC,
each srun/node in the allocation needs 70-80 IP connections with the
node running salloc, so a large node count can overwhelm the default
allocation of file descriptors.
On 2/2/2021 1:14 PM, Patrick Goetz wrote:
> That sounds like a linux issue. You probably need to reset the max
> limit for file descriptors someplace.
>
> Maybe start here:
> https://rtcamp.com/tutorials/linux/increase-open-files-limit/
>
> On 2/2/21 11:50 AM, Prentice Bisbal wrote:
>> Has anyone seen this error message before? A user just reported it. A
>> Google search doesn't turn up anything useful. I mean, I understand
>> what too many open files means, but I'm surprised to see it in the
>> context of salloc.
>>
>> salloc: error: Error on msg accept socket: Too many open files
>>
>
More information about the slurm-users
mailing list