Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com writes:
Therefore I believe that the root cause of the present issue is user applications opening a lot of files on our 96-core nodes, and we need to increase fs.file-max.
You could also set a limit per user, for instance in /etc/security/limits.d/. Then users would be blocked from opening unreasonably many files. One could use this to find which applications are responsible, and try to get them fixed.