[root@node5 log]# rpm -qi slurm

Install Date: Thu Dec 12 21:02:12 2024

License : GPLv2 and BSD

Signature : RSA/SHA256, Fri May 12 03:36:18 2023, Key ID 8a3872bf3228467c

Source RPM : slurm-22.05.9-1.el9.src.rpm

Build Date : Fri May 12 03:21:04 2023

Build Host : buildhw-x86-16.iad2.fedoraproject.org

Packager : Fedora Project

Vendor : Fedora Project

URL : https://slurm.schedmd.com/

Bug URL : https://bugz.fedoraproject.org/slurm

Summary : Simple Linux Utility for Resource Management

Slurm is an open source, fault-tolerant, and highly scalable

cluster management and job scheduling system for Linux clusters.

Components include machine status, partition management,

job management, scheduling and accounting modules.

From: Sean Crosby via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 12:46 pm
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

Just double checking. Can you check on your worker node

ls -la /etc/pam.d/*slurm*

ls: cannot access '/etc/pam.d/*slurm*': No such file or directory

(just checking if there's a specific pam file for slurmd on your system)

slurm_load_ctl_conf error: Zero Bytes were transmitted or received

(checking if slurmd is set up with a different user to SlurmUser)

slurm:x:12002:12002::/home/slurm:/bin/bash

From: Steven Jones via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 08:56
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>; Christopher Samuel <chris@csamuel.org>
Subject: [EXT] [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

External email: Please exercise caution

I rebuilt 4 nodes as rocky9.5

8><---

[2025-02-03T21:40:11.978] Node node6 now responding

[2025-02-03T21:41:15.698] _slurm_rpc_submit_batch_job: JobId=17 InitPrio=4294901759 usec=501

[2025-02-03T21:41:16.055] sched: Allocate JobId=17 NodeList=node6 #CPUs=1 Partition=debug

[2025-02-03T21:41:16.059] Killing non-startable batch JobId=17: Invalid user id

[2025-02-03T21:41:16.059] _job_complete: JobId=17 WEXITSTATUS 1

[2025-02-03T21:41:16.060] _job_complete: JobId=17 done

So same error RHEL9.5 to Rocky9.5

Unless I am missing some sort of config setting, I am out of permutations I can try.

From: Christopher Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Tuesday, 4 February 2025 10:13 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

On 2/3/25 2:33 pm, Steven Jones via slurm-users wrote:

> Just built 4 x rocky9 nodes and I do not get that error (but I get
> another I know how to fix, I think) so holistically I am thinking the
> version difference is too large.

Oh I think I missed this - when you say version difference do you mean
the Slurm version or the distro version?

I was assuming you were building your Slurm versions yourselves for
both, but that may be way off the mark, sorry!

What are the Slurm versions everywhere?

All the best,
Chris
--
Chris Samuel : https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=05%7C02%7Csteven.jones%40vuw.ac.nz%7Cfe1ff64e721d43ea401308dd44994db6%7Ccfe63e236951427e8683bb84dcf1d20c%7C0%7C0%7C638742147471778617%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MLsx9QWCn%2BEitmzGq8Z11Sc9SCN3LqZhNetc5DpLmcA%3D&reserved=0 : Berkeley, CA, USA

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com