[slurm-users] Slurm missing non primary group memberships
Janne Blomqvist
janne.blomqvist at aalto.fi
Tue Nov 20 04:12:59 MST 2018
On 10/11/2018 13.17, Douglas Jacobsen wrote:
> We've had issues getting sssd to work reliably on compute nodes (at
> least at scale), the reason is not fully understood, but basically if
> the connection times out with sssd it'll black list the server for 60s,
> which then causes those kinds of issues.
In our experience sssd doesn't work reliably in large environments if
user/group enumeration is enabled (the "enumerate" config option).
slurm used to require enumeration, but in
https://github.com/SchedMD/slurm/commit/48a4cdf8d9433b5655a26581768200e7a696ce87
I reworked the logic so that it should only be required in some special
weird cases. But that patch was several years ago, hopefully whatever
bugs were caused by it have been ironed out by now (*knocking on wood*).
> Setting LaunchParameters=send_gids will sidestep this issue by doing the
> lookups exclusively on the controller node, where more frequent
> connections can prevent time decay disconnections and reduce the
> likelihood of cache misses.
This is probably good idea particularly if one has large parallel jobs,
otherwise the nodes could DOS the AD/LDAP servers when launching if the
cache is cold..
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqvist at aalto.fi
More information about the slurm-users
mailing list