[slurm-users] SLURM disregards LDAP configured via SSSD

Leopold Talirz leopold.talirz at gmail.com
Tue Oct 10 21:30:44 UTC 2023

Solution: `UsePAM=1` in the slurm.conf, and `ln -s /etc/pam.d/sshd

The documentation of UsePAM in https://slurm.schedmd.com/slurm.conf.html is
actually quite clear - when googling, I somehow I was just confused by the
various references to pam_slurm / pam_slurm_adopt

On Tue, 10 Oct 2023 at 22:56, Leopold Talirz <leopold.talirz at gmail.com>

> Hi,
> I have an issue with SLURM (20.11.9) in conjunction with LDAP user
> accounts.
> Both the scheduler node, where slurmctld is running, and the worker nodes
> that are spun up by slurm are running the SSSD, which fetches user accounts
> from an external LDAP server.
> This works fine: I can log into the scheduler _and_ the worker nodes using
> SSH as an LDAP user without problems.
> This does not work: If, instead of SSH, I connect to a worker node via a
> slurm job, i.e. using `srun` (or `sbatch`), I get
> whoami: cannot find name for user ID 1290486416
> It seems that, for some reason, SLURM does not rely on the same
> authentication mechanism (configured via /etc/pam.d/*) as SSH.
> Any ideas what may be causing this or which logs I should be looking at to
> understand what is going on here?
> Potentially relevant further information:
> - The scheduler is running CentOS 7.9 (meaning /etc/pam.d is configured
> via the older authconfig), while the worker nodes are running AlmaLinux 8.7
> (meaning /etc/pam.d is configured via the newer authselect). As described
> above, both work fine when connecting via SSH, but I don't know whether
> slurm imposes additional requirements between the scheduler VM and the
> workers.
> - After I log in via SSH to one of the worker nodes for the first time,
> `srun` then also starts working (it recognizes the user account, apparently
> it is now seeing it in some cache). However, there are still differences
> between the user state when logging via SSH and via srun - for example,
> when using `srun` the user account does not have access to /dev/nvidia*
> devices, i.e. nvidia-smi shows "no devices found", while logging in via SSH
> shows the devices correctly.
> Best wishes,
> Leopold
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231010/0a5ac40d/attachment.htm>

More information about the slurm-users mailing list