No,
[root@node5 log]# ls -la /etc/pam.d/*slurm*
ls: cannot access '/etc/pam.d/*slurm*': No such file or directory
Slurm is installed,
[root@node5 log]# rpm -qi slurm
Name : slurm
Version : 22.05.9
Release : 1.el9
Architecture: x86_64
Install Date: Thu Dec 12 21:02:12 2024
Group : Unspecified
Size : 6308503
License : GPLv2 and BSD
Signature : RSA/SHA256, Fri May 12 03:36:18 2023, Key ID 8a3872bf3228467c
Source RPM : slurm-22.05.9-1.el9.src.rpm
Build Date : Fri May 12 03:21:04 2023
Build Host : buildhw-x86-16.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : https://slurm.schedmd.com/
Bug URL : https://bugz.fedoraproject.org/slurm
Summary : Simple Linux Utility for Resource Management
Description :
Slurm is an open source, fault-tolerant, and highly scalable
cluster management and job scheduling system for Linux clusters.
Components include machine status, partition management,
job management, scheduling and accounting modules.
[root@node5 log]#
regards
Steven Jones
B.Eng (Hons)
Technical Specialist - Linux RHCE
Victoria University, Digital Solutions,
Level 8 Rankin Brown Building,
Wellington, NZ
6012
0064 4 463 6272
From: Sean Crosby via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 12:46 pm
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
Just double checking. Can you check on your worker node
-
ls -la /etc/pam.d/*slurm*
[root@node5 log]# ls -la /etc/pam.d/*slurm*
ls: cannot access '/etc/pam.d/*slurm*': No such file or directory
[root@node5 log]#
(just checking if there's a specific pam file for slurmd on your system)
-
scontrol show config | grep -i SlurmdUser
Cannot run it as i attempted in rpmbuild locally and this is failing.
[root@node5 log]# scontrol show config | grep -i SlurmdUser
-
slurm_load_ctl_conf error: Zero Bytes were transmitted or received
-
[root@node5 log]#
(checking if slurmd is set up with a different user to SlurmUser)
-
grep slurm /etc/passwd
root@node5 log]# grep slurm /etc/passwd
slurm:x:12002:12002::/home/slurm:/bin/bash
slurm:x:12002:12002::/home/slurm:/bin/bash
[root@node5 log]#
Sean
From: Steven Jones via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 08:56
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>; Christopher Samuel <chris@csamuel.org>
Subject: [EXT] [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
External email: Please exercise caution
|
I rebuilt 4 nodes as rocky9.5
8><---
[2025-02-03T21:40:11.978] Node node6 now responding
[2025-02-03T21:41:15.698] _slurm_rpc_submit_batch_job: JobId=17 InitPrio=4294901759 usec=501
[2025-02-03T21:41:16.055] sched: Allocate JobId=17 NodeList=node6 #CPUs=1 Partition=debug
[2025-02-03T21:41:16.059] Killing non-startable batch JobId=17: Invalid user id
[2025-02-03T21:41:16.059] _job_complete: JobId=17 WEXITSTATUS 1
[2025-02-03T21:41:16.060] _job_complete: JobId=17 done
So same error RHEL9.5 to Rocky9.5
🙁
Unless I am missing some sort of config setting, I am out of permutations I can try.
From: Christopher Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 10:13 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld