16 Apr
2024
16 Apr
'24
11:08 a.m.
Ole Holm Nielsen via slurm-users <slurm-users@lists.schedmd.com> writes:
Therefore I believe that the root cause of the present issue is user applications opening a lot of files on our 96-core nodes, and we need to increase fs.file-max.
You could also set a limit per user, for instance in /etc/security/limits.d/. Then users would be blocked from opening unreasonably many files. One could use this to find which applications are responsible, and try to get them fixed. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo