[slurm-users] Slurm missing non primary group memberships

Joerg Sassmannshausen joerg.sassmannshausen at crick.ac.uk
Tue Nov 13 02:10:47 MST 2018


Dear all,

I am wondering if that is the same issue we are having here as well.
When I am adding users in the secondary group some time *after* the
initial user installation, the user cannot access the slurm partition it
suppose to. We found two remedies here, more or less by chance:
- rebooting both the slurm server and slurm DB server
- be patient and wait for long enough

Obviously, both remedies are not suitable if you are running a large
research environment. The reboot was happening as we physically had to
move the servers and the waiting for long enough was simply as we did
not have an answer to the question.
As already mentioned in a different posting, we have deleted the user in
slurm and re-installed it, updated the sssd on the slurm server, all in
vain.

However, reading the threat, the latter case points to a caching
problem, similar to the one described here. We are also using FreeIPA
and hence sssd for the ID lookup.

Poking the list a bit further on this subject: does anybody have similar
experiences when the lookup is done directly on AD? We are planning to
move to AD and if that is also an issue at least are warned here.

All the best

Jörg

On 10/11/18 11:17, Douglas Jacobsen wrote:
> We've had issues getting sssd to work reliably on compute nodes (at
> least at scale), the reason is not fully understood, but basically if
> the connection times out with sssd it'll black list the server for 60s,
> which then causes those kinds of issues.
>
> Setting LaunchParameters=send_gids will sidestep this issue by doing the
> lookups exclusively on the controller node, where more frequent
> connections can prevent time decay disconnections and reduce the
> likelihood of cache misses.
>
> On Fri, Nov 9, 2018 at 11:16 PM Chris Samuel <chris at csamuel.org
> <mailto:chris at csamuel.org>> wrote:
>
>     On Friday, 9 November 2018 2:47:51 AM AEDT Aravindh Sampathkumar wrote:
>
>     > navtp at console2:~> ssh c07b07 id
>     > uid=29865(navtp) gid=510(finland)
>     groups=510(finland),508(nav),5001(ghpc)
>     > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
>
>     Do you have SElinux configured by some chance?
>
>     If so you might want to check if it works with it disabled first..
>
>     All the best,
>     Chris
>     --
>      Chris Samuel  :  http://www.csamuel.org/
>     <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0>
>     :  Melbourne, VIC
>
>
>
>
> --
> Sent from Gmail Mobile

--
Dr. Jörg Saßmannshausen, MRSC
HPC & Research Data System Engineer
Scientific Computing
The Francis Crick Institute
1 Midland Way
London, NW1 1AT
email: joerg.sassmannshausen at crick.ac.uk
phone: 020 379 65139
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT


More information about the slurm-users mailing list