Hi,
Thanks for the reply. I already went through this 🙁. I checked all nodes, id works as does a ssh login.
[root@node4 ~]# id xxxjonesst@xxx.ac.nz
uid=1204805830(xxxjonesst@xxx.ac.nz) gid=1204805830(xxxjonesst@xxx.ac.nz)
8><---
Connection to node1 closed.
[root@xxxunicobuildt1 warewulf]# ssh xxxjonesst@xxx.ac.nz@node4
(xxxjonesst@xxx.ac.nz@node4) Password:
[xxxjonesst@xxx.ac.nz@node4 ~]$ whoami
xxxjonesst@xxx.ac.nz
[xxxjonesst@xxx.ac.nz@node4 ~]$
From: Chris Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Monday, 3 February 2025 10:00 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: RHEL8.10 V slurmctld
On 29/1/25 10:44 am, Steven Jones via slurm-users wrote:
> "2025-01-28T21:48:50.271] sched: Allocate JobId=4 NodeList=node4 #CPUs=1
> Partition=debug
> [2025-01-28T21:48:50.280] Killing non-startable batch JobId=4: Invalid
> user id"
Looking at the source code it looks like that second error is reported
back by slurmctld when it sends the RPC out to the compute node and it
gets a response back, so I would look at what's going on with node4 to
see what's being reported there.
All the best,
Chris
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com