Hi,
Thanks, but isolated isnt the goal in my case. The goal is to save admin time we cant afford and to have a far reaching setup.
So I have to link the HPC to IPA/Idm and on to AD in a trust that way user admins can jsut drop a student or staff member into an AD group and job done. That also means we can use Globus to transfer large lumps of data globally in and out of the HPC.
I have taken your notes as they look interesting for "extras" like I have not looked at making GPUs work yet. If I can get the basics going then I'll look at the icing.
regards
Steven
________________________________ From: Renfro, Michael Renfro@tntech.edu Sent: Tuesday, 4 February 2025 8:51 am To: Steven Jones steven.jones@vuw.ac.nz; slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com; Chris Samuel chris@csamuel.org Subject: Re: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
You don't often get email from renfro@tntech.edu. Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification
Late to the party here, but depending on how much time you have invested, how much you can tolerate reformats or other more destructive work, etc., you might consider OpenHPC and its install guide ([1] for RHEL 8 variants, [2] or [3] for RHEL 9 variants, depending on which version of Warewulf you prefer). I’ve also got some workshop materials on building login nodes, GPU drivers, stateful provisioning, etc. for OpenHPC 3 and Warewulf 3 at [4].
At least in an isolated VirtualBox environment with no outside IdP or other dependencies, my student workers have usually been able to get their first batch job running within a day.
[1] https://github.com/openhpc/ohpc/releases/download/v2.9.GA/Install_guide-Rock...
[2] https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rock...
[3] https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rock...
[4] https://github.com/mikerenfro/openhpc-beyond-the-install-guide/blob/main/ohp...
From: Steven Jones via slurm-users slurm-users@lists.schedmd.com Date: Sunday, February 2, 2025 at 5:48 PM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com, Chris Samuel chris@csamuel.org Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________
Hi,
I have never done a HPC before, it is all new to me so I can be making "newbie errors". The old HPC has been dumped on us so I am trying to build it "professionally" shall we say ie documented, stable and I will train ppl to build it (all this with no money at all).
My understanding is a login as a normal user and run a job, and this worked for me last time. It is possible I have missed something,
[xxxjonesst@xxx.ac.nz@xxxunicoslurmd1 ~]$ cat testjob.sh
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=debug
#SBATCH --time=00:10:00
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
echo "Hello World"
echo "Hello Error" 1>&2
This worked on a previous setup the outputs were in my home directory on the NFS server as expected.
regards
Steven
________________________________
From: Chris Samuel via slurm-users slurm-users@lists.schedmd.com Sent: Monday, 3 February 2025 11:59 am To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:
[2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002
Looking at the code that seems to come from this code:
if (!_slurm_authorized_user(msg->auth_uid)) { error("Security violation, batch launch RPC from uid %u", msg->auth_uid); rc = ESLURM_USER_ID_MISSING; /* or bad in this case */ goto done; }
and what it is calling is:
/* * Returns true if "uid" is a "slurm authorized user" - i.e. uid == 0 * or uid == slurm user id at this time. */ static bool _slurm_authorized_user(uid_t uid) { return ((uid == (uid_t) 0) || (uid == slurm_conf.slurm_user_id)); }
Is it possible you're trying to run Slurm as a user other than root or the user designated as the "SlurmUser" in your config?
Also check that whoever you have set as the SlurmUser has the same UID everywhere (in fact everyone should do).
All the best, Chris
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com