[slurm-users] New Bright Cluster Slurm issue for AD users

Yugendra Guvvala yguvvala at cambridgecomputer.com
Wed Feb 13 13:08:36 UTC 2019


Thanks Guys. I will go through all resources and report back how it goes. 

Thanks,
Yugi

> On Feb 13, 2019, at 7:58 AM, John Hearns <hearnsj at googlemail.com> wrote:
> 
> please have a look at section 6.3 of the Bright Admin Manual
> You have run updateprovisioners then rebooted the nodes?
> 
> 
> Configuring The Cluster To Authenticate Against An External LDAP Server The cluster can be configured in different ways to authenticate against an external LDAP server. For smaller clusters, a configuration where LDAP clients on all nodes point directly to the external server is recommended. An easy way to set this up is as follows:
> • On the head node:
> – In distributions that are: * derived from prior to RHEL 6: the URIs in /etc/ldap.conf, and in the image file /cm/images/default-image/etc/ldap.confaresettopointtotheexternalLDAP server. * derived from the RHEL 6.x series: the file /etc/ldap.conf does not exist. The files in which the changes then need to be made are /etc/nslcd.conf and /etc/pam_ldap.conf. To implement the changes, the nslcd daemon must then be restarted, for example with service nslcd restart. * derived from RHEL 7.x series: the file /etc/ldap.conf does not exist. The files in which the changes then need to be made are /etc/nslcd.conf and /etc/openldap/ldap.conf. To implement the changes, the nslcd daemon must then be restarted, for example with service nslcd restart.
> © Bright Computing, Inc.
> 214 User Management
> – theupdateprovisionerscommand(section5.2.4)isruntoupdateanyotherprovisioners. • Then, to update configurations on the regular nodes so that they are able to do LDAP lookups:
> – They can simply be rebooted to pick up the updated configuration, along with the new software image. – Alternatively, to avoid a reboot, the imageupdate command (section 5.6.2) can be run to pick up the new software image from a provisioner.
> 
>> On Wed, 13 Feb 2019 at 12:55, Antony Cleave <antony.cleave at gmail.com> wrote:
>> Can you ssh in as root and the su to the AD user to make sure that the node is integrated correctly? 
>> 
>> If you cannot su to an AD user on the node then Slurm will not be able to resolve the UID either as they use the same methods.
>> 
>>> On Wed, 13 Feb 2019, 12:35 Yugendra Guvvala, <yguvvala at cambridgecomputer.com> wrote:
>>> No, we can’t ssh to compute nodes. And this is by design that no one should be able to ssh to compute nodes other than root. 
>>> 
>>> I figure that munge is not configured for AD. We have configured our login image for AD and slurm and mung configurations are on head node. Not sure how to integrate these. 
>>> 
>>> Thanks,
>>> Yugi
>>> 
>>>> On Feb 13, 2019, at 7:27 AM, Antony Cleave <antony.cleave at gmail.com> wrote:
>>>> 
>>>> can you ssh to the compute node that job was trying to run on as as the AD user in question?
>>>> 
>>>> I've  seen similar issues on AD integrated systems where some nodes boot from a different image that have not yet been joined to the domain.
>>>> 
>>>> Antony
>>>> 
>>>>> On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala <yguvvala at cambridgecomputer.com> wrote:
>>>>> Hi, 
>>>>> 
>>>>> We are bringing a new cluster online. We installed SLURM through Bright Cluster Manager how ever we are running into a issue here. 
>>>>> 
>>>>> We are able to run jobs as root user and users created using bright cluster (cmsh commands). How ever we use AD authentication for all our users and when we try to submit jobs to slurm using AD users we are getting following error message. 
>>>>> 
>>>>> 
>>>>> srun: fatal: Invalid user id: 10952
>>>>> srun: fatal: Invalid user id: 10952
>>>>> srun: error: cnode001: task 0: Exited with exit code 1
>>>>> 
>>>>> Attached is the slurm.con file for reference. Please let us know if you have any insight into this. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks, 
>>>>> Yugi
>>>>> 
>>>>> Yugendra Guvvala | HPC Technologist  | Cambridge Computer  | "Artists in Data Storage" 
>>>>> Direct: 781-250-3273  | Cell: 806-773-4464  | yguvvala at cambridgecomputer.com  | www.cambridgecomputer.com
>>>>> 
>>>>> _______________________________________________________________________________________________
>>>>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190213/6ffee751/attachment.html>


More information about the slurm-users mailing list