[slurm-users] New Bright Cluster Slurm issue for AD users

Yugendra Guvvala yguvvala at cambridgecomputer.com
Fri Feb 22 20:44:09 UTC 2019


Just to close loop on this. 

This was not as Slurm issue it was more of AD configuration. 

AD needs to be installed on all nodes of cluster that way SLURM knows the USER ID. I had trouble with sssd DB folders missing and sssd.conf file having appropriate permissions. So look put for those. You can copy appropriate permissions and files form other working nodes and it should work fine. 

For any one curious here is the link to Bright Computing KB which helped us with the configuration. 

https://kb.brightcomputing.com/faq/index.php?action=artikel&cat=13&id=224&artlang=en <https://kb.brightcomputing.com/faq/index.php?action=artikel&cat=13&id=224&artlang=en>

Thanks, 
Yugi 



> On Feb 13, 2019, at 3:07 PM, John Hearns <hearnsj at googlemail.com> wrote:
> 
> Matthew, that deserves an explanation. Bright Computing Proof of Concept causes nightmares?
> That is a pretty strong assertion. Please give more details.
> 
> On Wed, 13 Feb 2019 at 16:01, Matthew BETTINGER <matthew.bettinger at external.total.com <mailto:matthew.bettinger at external.total.com>> wrote:
> One of the main guy Panos left Bright so no answer to your specific question but I hope you can get some support with it.  We dumped our BC PoC,  the sysadmin working on the PoC still has nightmares.
> 
> On 2/13/19, 6:54 AM, "slurm-users on behalf of John Hearns" <slurm-users-bounces at lists.schedmd.com <mailto:slurm-users-bounces at lists.schedmd.com> on behalf of hearnsj at googlemail.com <mailto:hearnsj at googlemail.com>> wrote:
> 
>     Yugendra,  the Bright support guys are excellent. 
>     Slurm is their default choice. I would ask again. Yes, Slurm is technically out of scope for them, but they shoudl help a bit.
> 
> 
>     By the way, I think your problem is that you have configured authentication using AD on your head node.
>     BUT you have not confiured it ont he compute node images. You probably have to prepare a new compute node image then push that otu to the compute nodes.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>     On Wed, 13 Feb 2019 at 12:35, Yugendra Guvvala <yguvvala at cambridgecomputer.com <mailto:yguvvala at cambridgecomputer.com>> wrote:
> 
> 
>     Also reached out to bright computing support and they say slurm is out of scope for them. 
> 
>     Thanks,
>     Yugi
> 
> 
>     On Feb 13, 2019, at 7:27 AM, Antony Cleave <antony.cleave at gmail.com <mailto:antony.cleave at gmail.com>> wrote:
> 
> 
> 
>     can you ssh to the compute node that job was trying to run on as as the AD user in question?
> 
> 
>     I've  seen similar issues on AD integrated systems where some nodes boot from a different image that have not yet been joined to the domain.
> 
> 
>     Antony
> 
> 
>     On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala <yguvvala at cambridgecomputer.com <mailto:yguvvala at cambridgecomputer.com>> wrote:
> 
> 
>     Hi, 
> 
> 
>     We are bringing a new cluster online. We installed SLURM through Bright Cluster Manager how ever we are running into a issue here. 
> 
> 
>     We are able to run jobs as root user and users created using bright cluster (cmsh commands). How ever we use AD authentication for all our users and when we try to submit jobs to slurm using AD users we are getting following error message. 
> 
> 
> 
> 
>     srun: fatal: Invalid user id: 10952
>     srun: fatal: Invalid user id: 10952
>     srun: error: cnode001: task 0: Exited with exit code 1
> 
> 
> 
>     Attached is the slurm.con file for reference. Please let us know if you have any insight into this. 
> 
> 
> 
> 
> 
> 
>     Thanks, 
>     Yugi
> 
> 
>     Yugendra Guvvala | HPC Technologist  | Cambridge Computer  | "Artists
>      in Data Storage" 
>     Direct: 781-250-3273  | Cell: 806-773-4464  | yguvvala at cambridgecomputer.com <mailto:yguvvala at cambridgecomputer.com>  | www.cambridgecomputer.com <http://www.cambridgecomputer.com/> <http://www.cambridgecomputer.com <http://www.cambridgecomputer.com/>>
> 
> 
>     _______________________________________________________________________________________________
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190222/7bc001f0/attachment.html>


More information about the slurm-users mailing list