<div dir="auto">Are you sure this isn't working as designed? <div dir="auto"><br></div><div dir="auto">I remember there is something annoying about groups in the manual. Here it is. This is why I prefer accounts.</div><div dir="auto"><br></div><div dir="auto"><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">NOTE:</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"> For performance reasons, Slurm maintains a list of user IDs allowed to use each partition and this is checked at job submission time. This list of user IDs is updated when the </span><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">slurmctld</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">daemon is restarted, reconfigured (e.g. "scontrol reconfig") or the partition's </span><b style="margin:0px;padding:0px;border:0px;font-size:18px;line-height:inherit;font-family:"source sans pro",helvetica,arial,sans-serif;vertical-align:baseline;color:rgb(70,84,92);background-color:rgb(255,255,255)">AllowGroups</b><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"> value is reset, even if is value is unchanged (e.g. "scontrol update PartitionName=name AllowGroups=group"). For a user's access to a partition to change, both his group membership must change and Slurm's internal user ID list must change using one of the methods described above.</span><br></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"><br></span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">Are you adding groups after submission too? Does changing allow groups on the partition fix it too? </span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)"><br></span></div><div dir="auto"><span style="color:rgb(70,84,92);font-family:"source sans pro",helvetica,arial,sans-serif;font-size:18px;background-color:rgb(255,255,255)">Antony</span></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, 13 Nov 2018, 09:13 Joerg Sassmannshausen <<a href="mailto:joerg.sassmannshausen@crick.ac.uk">joerg.sassmannshausen@crick.ac.uk</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>
<br>
I am wondering if that is the same issue we are having here as well.<br>
When I am adding users in the secondary group some time *after* the<br>
initial user installation, the user cannot access the slurm partition it<br>
suppose to. We found two remedies here, more or less by chance:<br>
- rebooting both the slurm server and slurm DB server<br>
- be patient and wait for long enough<br>
<br>
Obviously, both remedies are not suitable if you are running a large<br>
research environment. The reboot was happening as we physically had to<br>
move the servers and the waiting for long enough was simply as we did<br>
not have an answer to the question.<br>
As already mentioned in a different posting, we have deleted the user in<br>
slurm and re-installed it, updated the sssd on the slurm server, all in<br>
vain.<br>
<br>
However, reading the threat, the latter case points to a caching<br>
problem, similar to the one described here. We are also using FreeIPA<br>
and hence sssd for the ID lookup.<br>
<br>
Poking the list a bit further on this subject: does anybody have similar<br>
experiences when the lookup is done directly on AD? We are planning to<br>
move to AD and if that is also an issue at least are warned here.<br>
<br>
All the best<br>
<br>
Jörg<br>
<br>
On 10/11/18 11:17, Douglas Jacobsen wrote:<br>
> We've had issues getting sssd to work reliably on compute nodes (at<br>
> least at scale), the reason is not fully understood, but basically if<br>
> the connection times out with sssd it'll black list the server for 60s,<br>
> which then causes those kinds of issues.<br>
><br>
> Setting LaunchParameters=send_gids will sidestep this issue by doing the<br>
> lookups exclusively on the controller node, where more frequent<br>
> connections can prevent time decay disconnections and reduce the<br>
> likelihood of cache misses.<br>
><br>
> On Fri, Nov 9, 2018 at 11:16 PM Chris Samuel <<a href="mailto:chris@csamuel.org" target="_blank" rel="noreferrer">chris@csamuel.org</a><br>
> <mailto:<a href="mailto:chris@csamuel.org" target="_blank" rel="noreferrer">chris@csamuel.org</a>>> wrote:<br>
><br>
> On Friday, 9 November 2018 2:47:51 AM AEDT Aravindh Sampathkumar wrote:<br>
><br>
> > navtp@console2:~> ssh c07b07 id<br>
> > uid=29865(navtp) gid=510(finland)<br>
> groups=510(finland),508(nav),5001(ghpc)<br>
> > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023<br>
><br>
> Do you have SElinux configured by some chance?<br>
><br>
> If so you might want to check if it works with it disabled first..<br>
><br>
> All the best,<br>
> Chris<br>
> --<br>
> Chris Samuel : <a href="http://www.csamuel.org/" rel="noreferrer noreferrer" target="_blank">http://www.csamuel.org/</a><br>
> <<a href="https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0" rel="noreferrer noreferrer" target="_blank">https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7C%7Cbf873add236a4bc74b0a08d646ff523c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636774459751813515&sdata=L5%2Fg8HVibwr3xnv4%2FzlnwMBj8HgMlytUYposfbGi%2Bq8%3D&reserved=0</a>><br>
> : Melbourne, VIC<br>
><br>
><br>
><br>
><br>
> --<br>
> Sent from Gmail Mobile<br>
<br>
--<br>
Dr. Jörg Saßmannshausen, MRSC<br>
HPC & Research Data System Engineer<br>
Scientific Computing<br>
The Francis Crick Institute<br>
1 Midland Way<br>
London, NW1 1AT<br>
email: <a href="mailto:joerg.sassmannshausen@crick.ac.uk" target="_blank" rel="noreferrer">joerg.sassmannshausen@crick.ac.uk</a><br>
phone: 020 379 65139<br>
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT<br>
</blockquote></div>