[slurm-users] Slurm missing non primary group memberships

Antony Cleave antony.cleave at gmail.com
Tue Nov 13 02:21:34 MST 2018


Are you sure this isn't working as designed?

I remember there is something annoying about groups in the manual.  Here it
is. This is why I prefer accounts.

*NOTE:* For performance reasons, Slurm maintains a list of user IDs allowed
to use each partition and this is checked at job submission time. This list
of user IDs is updated when the *slurmctld*daemon is restarted,
reconfigured (e.g. "scontrol reconfig") or the partition's *AllowGroups* value
is reset, even if is value is unchanged (e.g. "scontrol update
PartitionName=name AllowGroups=group"). For a user's access to a partition
to change, both his group membership must change and Slurm's internal user
ID list must change using one of the methods described above.

Are you adding groups after submission too? Does changing allow groups on
the partition fix it too?

Antony

On Tue, 13 Nov 2018, 09:13 Joerg Sassmannshausen <
joerg.sassmannshausen at crick.ac.uk wrote:

> Dear all,
>
> I am wondering if that is the same issue we are having here as well.
> When I am adding users in the secondary group some time *after* the
> initial user installation, the user cannot access the slurm partition it
> suppose to. We found two remedies here, more or less by chance:
> - rebooting both the slurm server and slurm DB server
> - be patient and wait for long enough
>
> Obviously, both remedies are not suitable if you are running a large
> research environment. The reboot was happening as we physically had to
> move the servers and the waiting for long enough was simply as we did
> not have an answer to the question.
> As already mentioned in a different posting, we have deleted the user in
> slurm and re-installed it, updated the sssd on the slurm server, all in
> vain.
>
> However, reading the threat, the latter case points to a caching
> problem, similar to the one described here. We are also using FreeIPA
> and hence sssd for the ID lookup.
>
> Poking the list a bit further on this subject: does anybody have similar
> experiences when the lookup is done directly on AD? We are planning to
> move to AD and if that is also an issue at least are warned here.
>
> All the best
>
> Jörg
>
> On 10/11/18 11:17, Douglas Jacobsen wrote:
> > We've had issues getting sssd to work reliably on compute nodes (at
> > least at scale), the reason is not fully understood, but basically if
> > the connection times out with sssd it'll black list the server for 60s,
> > which then causes those kinds of issues.
> >
> > Setting LaunchParameters=send_gids will sidestep this issue by doing the
> > lookups exclusively on the controller node, where more frequent
> > connections can prevent time decay disconnections and reduce the
> > likelihood of cache misses.
> >
> > On Fri, Nov 9, 2018 at 11:16 PM Chris Samuel <chris at csamuel.org
> > <mailto:chris at csamuel.org>> wrote:
> >
> >     On Friday, 9 November 2018 2:47:51 AM AEDT Aravindh Sampathkumar
> wrote:
> >
> >     > navtp at console2:~> ssh c07b07 id
> >     > uid=29865(navtp) gid=510(finland)
> >     groups=510(finland),508(nav),5001(ghpc)
> >     > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> >
> >     Do you have SElinux configured by some chance?
> >
> >     If so you might want to check if it works with it disabled first..
> >
> >     All the best,
> >     Chris
> >     --
> >      Chris Samuel  :  http://www.csamuel.org/
> >     <
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0
> >
> >     :  Melbourne, VIC
> >
> >
> >
> >
> > --
> > Sent from Gmail Mobile
>
> --
> Dr. Jörg Saßmannshausen, MRSC
> HPC & Research Data System Engineer
> Scientific Computing
> The Francis Crick Institute
> 1 Midland Way
> London, NW1 1AT
> email: joerg.sassmannshausen at crick.ac.uk
> phone: 020 379 65139
> The Francis Crick Institute Limited is a registered charity in England and
> Wales no. 1140062 and a company registered in England and Wales no.
> 06885462, with its registered office at 1 Midland Road London NW1 1AT
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181113/3d581de3/attachment-0001.html>


More information about the slurm-users mailing list