<div dir="auto">Are you sure this isn't working as designed? <div dir="auto"><br></div><div dir="auto">I remember there is something annoying about groups in the manual.  Here it is. This is why I prefer accounts.</div><div dir="auto"><br></div><div dir="auto"><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">NOTE:</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"> For performance reasons, Slurm maintains a list of user IDs allowed to use each partition and this is checked at job submission time. This list of user IDs is updated when the </span><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">slurmctld</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">daemon is restarted, reconfigured (e.g. "scontrol reconfig") or the partition's </span><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">AllowGroups</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"> value is reset, even if is value is unchanged (e.g. "scontrol update PartitionName=name AllowGroups=group"). For a user's access to a partition to change, both his group membership must change and Slurm's internal user ID list must change using one of the methods described above.</span><br></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"><br></span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">Are you adding groups after submission too? Does changing allow groups on the partition fix it too? </span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"><br></span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">Antony</span></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, 13 Nov 2018, 09:13 Joerg Sassmannshausen <<a href="mailto:joerg.sassmannshausen@crick.ac.uk">joerg.sassmannshausen@crick.ac.uk</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>

<br>

I am wondering if that is the same issue we are having here as well.<br>

When I am adding users in the secondary group some time *after* the<br>

initial user installation, the user cannot access the slurm partition it<br>

suppose to. We found two remedies here, more or less by chance:<br>

- rebooting both the slurm server and slurm DB server<br>

- be patient and wait for long enough<br>

<br>

Obviously, both remedies are not suitable if you are running a large<br>

research environment. The reboot was happening as we physically had to<br>

move the servers and the waiting for long enough was simply as we did<br>

not have an answer to the question.<br>

As already mentioned in a different posting, we have deleted the user in<br>

slurm and re-installed it, updated the sssd on the slurm server, all in<br>

vain.<br>

<br>

However, reading the threat, the latter case points to a caching<br>

problem, similar to the one described here. We are also using FreeIPA<br>

and hence sssd for the ID lookup.<br>

<br>

Poking the list a bit further on this subject: does anybody have similar<br>

experiences when the lookup is done directly on AD? We are planning to<br>

move to AD and if that is also an issue at least are warned here.<br>

<br>

All the best<br>

<br>

Jörg<br>

<br>

On 10/11/18 11:17, Douglas Jacobsen wrote:<br>

> We've had issues getting sssd to work reliably on compute nodes (at<br>

> least at scale), the reason is not fully understood, but basically if<br>

> the connection times out with sssd it'll black list the server for 60s,<br>

> which then causes those kinds of issues.<br>

><br>

> Setting LaunchParameters=send_gids will sidestep this issue by doing the<br>

> lookups exclusively on the controller node, where more frequent<br>

> connections can prevent time decay disconnections and reduce the<br>

> likelihood of cache misses.<br>

><br>

> On Fri, Nov 9, 2018 at 11:16 PM Chris Samuel <<a href="mailto:chris@csamuel.org" target="_blank" rel="noreferrer">chris@csamuel.org</a><br>

> <mailto:<a href="mailto:chris@csamuel.org" target="_blank" rel="noreferrer">chris@csamuel.org</a>>> wrote:<br>

><br>

>     On Friday, 9 November 2018 2:47:51 AM AEDT Aravindh Sampathkumar wrote:<br>

><br>

>     > navtp@console2:~> ssh c07b07 id<br>

>     > uid=29865(navtp) gid=510(finland)<br>

>     groups=510(finland),508(nav),5001(ghpc)<br>

>     > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023<br>

><br>

>     Do you have SElinux configured by some chance?<br>

><br>

>     If so you might want to check if it works with it disabled first..<br>

><br>

>     All the best,<br>

>     Chris<br>

>     --<br>

>      Chris Samuel  :  <a href="http://www.csamuel.org/" rel="noreferrer noreferrer" target="_blank">http://www.csamuel.org/</a><br>

>     <<a href="https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0" rel="noreferrer noreferrer" target="_blank">https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0</a>><br>

>     :  Melbourne, VIC<br>

><br>

><br>

><br>

><br>

> --<br>

> Sent from Gmail Mobile<br>

<br>

--<br>

Dr. Jörg Saßmannshausen, MRSC<br>

HPC & Research Data System Engineer<br>

Scientific Computing<br>

The Francis Crick Institute<br>

1 Midland Way<br>

London, NW1 1AT<br>

email: <a href="mailto:joerg.sassmannshausen@crick.ac.uk" target="_blank" rel="noreferrer">joerg.sassmannshausen@crick.ac.uk</a><br>

phone: 020 379 65139<br>

The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT<br>

</blockquote></div>