[slurm-users] New Bright Cluster Slurm issue for AD users

John Hearns hearnsj at googlemail.com
Wed Feb 13 20:07:13 UTC 2019


Matthew, that deserves an explanation. Bright Computing Proof of Concept
causes nightmares?
That is a pretty strong assertion. Please give more details.

On Wed, 13 Feb 2019 at 16:01, Matthew BETTINGER <
matthew.bettinger at external.total.com> wrote:

> One of the main guy Panos left Bright so no answer to your specific
> question but I hope you can get some support with it.  We dumped our BC
> PoC,  the sysadmin working on the PoC still has nightmares.
>
> On 2/13/19, 6:54 AM, "slurm-users on behalf of John Hearns" <
> slurm-users-bounces at lists.schedmd.com on behalf of hearnsj at googlemail.com>
> wrote:
>
>     Yugendra,  the Bright support guys are excellent.
>     Slurm is their default choice. I would ask again. Yes, Slurm is
> technically out of scope for them, but they shoudl help a bit.
>
>
>     By the way, I think your problem is that you have configured
> authentication using AD on your head node.
>     BUT you have not confiured it ont he compute node images. You probably
> have to prepare a new compute node image then push that otu to the compute
> nodes.
>
>
>
>
>
>
>
>
>
>
>
>
>     On Wed, 13 Feb 2019 at 12:35, Yugendra Guvvala <
> yguvvala at cambridgecomputer.com> wrote:
>
>
>     Also reached out to bright computing support and they say slurm is out
> of scope for them.
>
>     Thanks,
>     Yugi
>
>
>     On Feb 13, 2019, at 7:27 AM, Antony Cleave <antony.cleave at gmail.com>
> wrote:
>
>
>
>     can you ssh to the compute node that job was trying to run on as as
> the AD user in question?
>
>
>     I've  seen similar issues on AD integrated systems where some nodes
> boot from a different image that have not yet been joined to the domain.
>
>
>     Antony
>
>
>     On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala <
> yguvvala at cambridgecomputer.com> wrote:
>
>
>     Hi,
>
>
>     We are bringing a new cluster online. We installed SLURM through
> Bright Cluster Manager how ever we are running into a issue here.
>
>
>     We are able to run jobs as root user and users created using bright
> cluster (cmsh commands). How ever we use AD authentication for all our
> users and when we try to submit jobs to slurm using AD users we are getting
> following error message.
>
>
>
>
>     srun: fatal: Invalid user id: 10952
>     srun: fatal: Invalid user id: 10952
>     srun: error: cnode001: task 0: Exited with exit code 1
>
>
>
>     Attached is the slurm.con file for reference. Please let us know if
> you have any insight into this.
>
>
>
>
>
>
>     Thanks,
>     Yugi
>
>
>     Yugendra Guvvala | HPC Technologist  | Cambridge Computer  | "Artists
>      in Data Storage"
>     Direct: 781-250-3273  | Cell: 806-773-4464  |
> yguvvala at cambridgecomputer.com  | www.cambridgecomputer.com <
> http://www.cambridgecomputer.com>
>
>
>
> _______________________________________________________________________________________________
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190213/2ae21a0e/attachment.html>


More information about the slurm-users mailing list