[slurm-users] New Bright Cluster Slurm issue for AD users

Matthew BETTINGER matthew.bettinger at external.total.com
Wed Feb 13 15:59:21 UTC 2019

One of the main guy Panos left Bright so no answer to your specific question but I hope you can get some support with it.  We dumped our BC PoC,  the sysadmin working on the PoC still has nightmares.

´╗┐On 2/13/19, 6:54 AM, "slurm-users on behalf of John Hearns" <slurm-users-bounces at lists.schedmd.com on behalf of hearnsj at googlemail.com> wrote:

    Yugendra,  the Bright support guys are excellent. 
    Slurm is their default choice. I would ask again. Yes, Slurm is technically out of scope for them, but they shoudl help a bit.
    By the way, I think your problem is that you have configured authentication using AD on your head node.
    BUT you have not confiured it ont he compute node images. You probably have to prepare a new compute node image then push that otu to the compute nodes.
    On Wed, 13 Feb 2019 at 12:35, Yugendra Guvvala <yguvvala at cambridgecomputer.com> wrote:
    Also reached out to bright computing support and they say slurm is out of scope for them. 
    On Feb 13, 2019, at 7:27 AM, Antony Cleave <antony.cleave at gmail.com> wrote:
    can you ssh to the compute node that job was trying to run on as as the AD user in question?
    I've  seen similar issues on AD integrated systems where some nodes boot from a different image that have not yet been joined to the domain.
    On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala <yguvvala at cambridgecomputer.com> wrote:
    We are bringing a new cluster online. We installed SLURM through Bright Cluster Manager how ever we are running into a issue here. 
    We are able to run jobs as root user and users created using bright cluster (cmsh commands). How ever we use AD authentication for all our users and when we try to submit jobs to slurm using AD users we are getting following error message. 
    srun: fatal: Invalid user id: 10952
    srun: fatal: Invalid user id: 10952
    srun: error: cnode001: task 0: Exited with exit code 1
    Attached is the slurm.con file for reference. Please let us know if you have any insight into this. 
    Yugendra Guvvala | HPC Technologist  | Cambridge Computer  | "Artists
     in Data Storage" 
    Direct: 781-250-3273  | Cell: 806-773-4464  | yguvvala at cambridgecomputer.com  | www.cambridgecomputer.com <http://www.cambridgecomputer.com>

More information about the slurm-users mailing list