[slurm-users] Fwd: partition problem with 2 different users

Joerg Sassmannshausen joerg.sassmannshausen at crick.ac.uk
Thu Nov 1 06:15:07 MDT 2018


Hi Chris, hi all,

apologies for that. I am not quite sure why that happened, I assume that
somehow Exchange does not like my electronic signature and converts that
into a winmail.dat file. Thunderbird seems to convert that correctly back.

Anyhow, this is the original email again, this time without being
signed. Please let me know if it is still being sent out as winmail.dat
file.

Thanks for your patience and bringing this to my attention.

Regards

Jörg

***************************************************************************

Dear all.

I am a bit puzzled about the behaviour of slurm. We are using partitions
and the users are allocated to certain partitions.
Now, I got 2 users, say C and P who are both in the same (unix) groups
as confirmed with

$ id P
$ id C

we checked this on the node which is running slurm and also on the node
which is running the DB as well as the login nodes.

On the node which is running slurm, if I do this:

$ sacctmgr show user C
      User   Def Acct     Admin
---------- ---------- ---------
   C u_swan      None

$ sacctmgr show user P
      User   Def Acct     Admin
---------- ---------- ---------
   P u_swan      None

So, for me it appears they have the same user attributes in slurm as
well (or whatever you want to call it).

However, if I do this command on the login node from the user's account,
where we are submitting jobs from:

$  srun --partition=FOO --pty bash

I get:
For user C:
srun: error: Unable to allocate resources: User's group not permitted to
use this partition

For user P:
I get a compute node.

Now, we had that problem before with user P. As we had to reboot the
slurm-node and the DB-node, that problem fixed itself after the reboot.
As it is a production system, I cannot schedule another system outage of
slurm to get that problem fixed that way.

So, what is going on here? Why is slurm doing it correctly for one user
after the reboot and when I fixed the (unix) group for the second user
(done in freeIPA) *after* the reboot, we got the same problem again.
I have already tried deleting the problematic user account in slurm like
this:
$ sacctmgr del user C
and re-installed it via the ansible script we are using. This did not
fix the problem.

We are using CentOS Linux release 7.4.1708 with slurm version 15.08.13.

We are a bit puzzled about this so I have decided to ask here.

All the best from a sunny London

Jörg

--
Dr. Jörg Saßmannshausen, MRSC
HPC & Research Data System Engineer
Scientific Computing
The Francis Crick Institute
1 Midland Way
London, NW1 1AT
email: joerg.sassmannshausen at crick.ac.uk
phone: 020 379 65139



On 01/11/18 11:48, Chris Samuel wrote:
> Hi Joerg,
>
> Looks like something went wrong with this email, all it had was a winmail.dat
> attachment.
>
> cheers,
> Chris
>

--
Dr. Jörg Saßmannshausen, MRSC
HPC & Research Data System Engineer
Scientific Computing
The Francis Crick Institute
1 Midland Way
London, NW1 1AT
email: joerg.sassmannshausen at crick.ac.uk
phone: 020 379 65139
The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT



More information about the slurm-users mailing list