We only do isolated on the students’ VirtualBox setups because it’s simpler for them to get started with. Our production HPC with OpenHPC is definitely integrated with our Active Directory (directly via sssd, not with an intermediate product),
etc. Not everyone does it that way, but our scale is small enough to where we’ve never had a load or other performance issue with our AD.
Steven Jones <steven.jones@vuw.ac.nz>
Date: Monday, February 3, 2025 at 2:14 PM
To: Renfro, Michael <Renfro@tntech.edu>, slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>, Chris Samuel <chris@csamuel.org>
Subject: Re: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
Thanks, but isolated isnt the goal in my case. The goal is to save admin time we cant afford and to have a far reaching setup.
So I have to link the HPC to IPA/Idm and on to AD in a trust that way user admins can jsut drop a student or staff member into an AD group and job done. That also means we can use Globus to transfer large lumps of data globally in and out of the HPC.
I have taken your notes as they look interesting for "extras" like I have not looked at making GPUs work yet. If I can get the basics going then I'll look at the icing.
From: Renfro, Michael <Renfro@tntech.edu>
Sent: Tuesday, 4 February 2025 8:51 am
To: Steven Jones <steven.jones@vuw.ac.nz>; slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>; Chris Samuel <chris@csamuel.org>
Subject: Re: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
You don't often get email from renfro@tntech.edu.
Learn why this is important
Late to the party here, but depending on how much time you have invested, how much you can tolerate reformats or other more destructive work, etc., you might consider OpenHPC and its install guide ([1] for
RHEL 8 variants, [2] or [3] for RHEL 9 variants, depending on which version of Warewulf you prefer). I’ve also got some workshop materials on building login nodes, GPU drivers, stateful provisioning, etc. for OpenHPC 3 and Warewulf 3 at [4].
At least in an isolated VirtualBox environment with no outside IdP or other dependencies, my student workers have usually been able to get their first batch job running within a day.
Steven Jones via slurm-users <slurm-users@lists.schedmd.com>
Date: Sunday, February 2, 2025 at 5:48 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>, Chris Samuel <chris@csamuel.org>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
I have never done a HPC before, it is all new to me so I can be making "newbie errors". The old HPC has been dumped on us so I am trying to build it "professionally" shall we say ie documented, stable and I will train ppl to build it (all this with no money
at all).
My understanding is a login as a normal user and run a job, and this worked for me last time. It is possible I have missed something,
[xxxjonesst@xxx.ac.nz@xxxunicoslurmd1 ~]$ cat testjob.sh
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=debug
#SBATCH --time=00:10:00
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
echo "Hello World"
echo "Hello Error" 1>&2
This worked on a previous setup the outputs were in my home directory on the NFS server as expected.
From: Chris Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Monday, 3 February 2025 11:59 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:
> [2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002
Looking at the code that seems to come from this code:
if (!_slurm_authorized_user(msg->auth_uid)) {
error("Security violation, batch launch RPC from uid %u",
rc = ESLURM_USER_ID_MISSING; /* or bad in this case */
goto done;
and what it is calling is:
* Returns true if "uid" is a "slurm authorized user" - i.e. uid == 0
* or uid == slurm user id at this time.
static bool
_slurm_authorized_user(uid_t uid)
return ((uid == (uid_t) 0) || (uid == slurm_conf.slurm_user_id));
Is it possible you're trying to run Slurm as a user other than root or
the user designated as the "SlurmUser" in your config?
Also check that whoever you have set as the SlurmUser has the same UID
everywhere (in fact everyone should do).
All the best,
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com