Hi folks
I'm in the process of standing up some Stampede2 Dell C3620p KNL nodes and I seem to be hitting a blind spot.
I previously "successfully" configured KNL's on an Intel board (S7200AP), with OpenHPC and Rocky8. I say "successfully" because it works but evidently my latest troubleshooting has revealed that I may have been lucky rather than an expert KNL integrator :-)
I thought I knew what I was doing, but after repeating my Intel KNL recipe with the Dell system, I have unearthed my …
[View More]ignorance with this wonderful (but deprecated) technology (anecdotally, the KNLs offer excellent performance and power efficiency for our workloads, particularly when contrasted with our alternative available hardware).
The first discovery was the "syscfg" for Intel boards is not the same as the "syscfg" for Dell boards. I've since sorted this out.
The second discovery was made while troubleshooting an issue that I'm hitting. After realising that the slurmd client nodes don't seem to be reading the "knl_generic.conf" parameters that are specified in /etc/slurm on the smshost (OpenHPC parlance for head node ... And it's a Slurm config less set up), I think my original Intel solution was working out of luck more than ingenuity.
For reference , the Slurm configuration for KNL now includes:
```
NodeFeaturesPlugins=knl_generic
DebugFlags=NodeFeatures
GresTypes=hbm
```
And I've created a separate "knl_generic.conf" that points to the Dell specific tools and features.
For the Dell system, slurmd seems to ignore my knl_generic.conf file and is drawing defaults from somewhere else. Slurm still considers SystemType to be Intel, SyscfgPath to be the default location, and SyscfgTimeout to be 1000. For Dell systems, Slurm needs to have SystemType=Dell and Timeout to be 1000.
I don't understand why the nodes are not reading the knl_generic file - any help or clues would be appreciated.
Here's my theory on what is happening:
The Intel KNL system was successful by luck ... It probably exhibited the same ignore-the-config-file but ran default NodeFeatures for some generic knl_generic settings which are stored somewhere as default parameters. I must have just lucked out when I was using my Intel KNL system because it was using the defaults (that are compatible with Intel).
If this assumption is correct, the Dell system is not working because it isn't compatible with the Intel defaults.
Any clues on how to successfully invoke the config file (or better debuggingtechniwues to figure out why it isn't) would be appreciated.
I can share journalctl feedback if necessary. For now, I've tried changing ownership of the config files to root:slurm, copied knl_generic.conf to the compute nodes' /etc/slurm/ and also tried to specify the config file by running (on the compute nodes) "slurmd" with "-f" ... No joy; if slurmd runs successfully (when I don't screw up some random experimental settings) then it always seems to ignore knl_generic.conf and loads some default settings from somewhere.
A few questions:
1. Are there default settings stored somewhere? I might be barking up the wrong tree, although I've looked for files that may clash with the config file I've created but can't find any.
2. Is there a better way to force the knl_generic file to be loaded?
3. Is the configless Slurm somehow not reading the knl_generic file to the clients? I understand that all configuration files are read from the host server.
Many thanks for any help!
Regards / Groete / Sala(ni) Kahle
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Bryan Johnston
Senior HPC Technologist II
Lead: HPC Ecosystems Project
HPCwire 2024's Outstanding Leader in HPC
CHPC | www.chpc.ac.za | NICIS | nicis.ac.za
Centre for High Performance Computing
If you receive an email from me out of office hours for you, please do not feel obliged to respond during off-hours!
Book time to meet with me<https://outlook.office.com/bookwithme/user/87af4906a703488386578f34e4473c74…>
[View Less]
Does anyone have any experience with using Kerberos/GSSAPI and Slurm? I’m specifically wondering if there is a known mechanism for providing proper Kerberos credentials to Slurm batch jobs, such that those processes would be able to access a filesystem that requires Kerberos credentials. Some quick searching returned nothing useful. Interactive jobs have a similar problem, but I’m hoping that SSH credential forwarding can be leveraged there.
I’m nothing like an expert in Kerberos, so forgive …
[View More]any apparent ignorance.
Thanks,
John
[View Less]
Hi,
I have defined a partition for each GPU type we have in the cluster. This was mainly because I have different Node types for each GPU type and I want to set `DefCpuPerGPU` `DefMemPerGPU` for each of them. Unfortunately one can't set them per node but can do that per partition.
Now sometimes people don't care about the GPU type and would like any of the partitions to pick up the job. The `--partition` in `sbatch` does allow specifying multiple partitions and this works fine when I'm not …
[View More]specifying `--gpu`. However when I add do something like `sbatch -p A,B --gpus 1 script.sh` then I get "srun: job 6279 queued and waiting for resources" even though partition B does have a GPU to offer. Strangely if the first partition specified (i.e. A) had a free GPU it would allocate the GPU and run the job.
Is this a bug? Perhaps related to this: https://groups.google.com/g/slurm-users/c/UOUVfkajUBQ
[View Less]
Hello,
Yes, I know system is single-socket, but in a similar computer, this configuration works fine... After some tests, I have realized that with SLURM 23.11.0 I can assign CPUs in "gres.conf" (and, then, use half CPU for one GPU and other half for other GPU), but with SLURM 24.11.4 I receive " _check_core_range_matches_sock". I have read "NEWS" file inside sources files in 24.11.4 tgz and it says:
[…]
* Changes in Slurm 23.11.7
==========================
[…]
-- Fix issue that only …
[View More]left one thread on each core available when "CPUs=" is
configured to total thread count on multi-threaded hardware and no other
topology info ("Sockets=", "CoresPerSocket", etc.) is configured.
[…]
So, I suppose, in version 23.11.7 SLURM corrected that behaviour. Could someone confirm that?
Thanks.
[View Less]
Hello,
I have compiled SLURM-24.11.3 and I have configured two GPUs in my system (slurmctld and slurmd are running in the same computer). Computes has a old processor Intel i7 with 4 cores and 4 hyperthreading. Node is configured with "NodeName=mysystem CPUs=8 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7940 Gres=gpu:geforce_gtx_titan_x:1,gpu:geforce_gtx_titan_black:1". "lscpu" command returns:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte …
[View More]Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
CPU family: 6
Model: 26
Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
BIOS Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
File gres.conf is:
NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_x File=/dev/nvidia0 CPUs=0-1
NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_black File=/dev/nvidia1 CPUs=2-3
However, when I start daemon "slurmctld", system returns this error:
[2025-04-28T09:35:41.003] error: _check_core_range_matches_sock: gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match socket boundaries. (Socket 0 is cores 0-3)
[2025-04-28T09:35:41.003] error: Setting node aopcvis5 state to INVAL with reason:gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match socket boundaries. (Socket 0 is cores 0-3)
Where is my configuration error?
Thanks.
[View Less]
We would like to put limits on interactive jobs (started by salloc) so
that users don't leave unused interactive jobs behind on the cluster by
mistake.
I can't offhand find any configurations that limit interactive jobs, such
as enforcing a timelimit.
Perhaps this could be done in job_submit.lua, but I couldn't find any
job_desc parameters in the source code which would indicate if a job is
interactive or not.
Question: How do people limit interactive jobs, or identify orphaned jobs
…
[View More]and kill them?
Thanks a lot,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
[View Less]
Hi all,
I'm trying to setup a QoS on a small 5 nodes cluster running slurm
24.05.7. My goal is to limit the resources on a (time x number of cores)
strategy to avoid one large job requesting all the resources for too
long time. I've read from https://slurm.schedmd.com/qos.html and some
discussion but my setup is still not working.
I think I need to set these informations:
MaxCPUsPerJob=172800
MaxWallDurationPerJob=48:00:00
Flags=DenyOnLimit,OverPartQOS
for:
12h max for 240 cores => (…
[View More]12*240*60=172800mn)
no job can exceed 2 days
do not accept jobs out of these limits.
What I've done:
1) create the QoS:
sudo sacctmgr add qos workflowlimit \
MaxWallDurationPerJob=48:00:00 \
MaxCPUsPerJob=172800 \
Flags=DenyOnLimit,OverPartQOS
2) Check
sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall
Name MaxTRES MaxWall
---------------- ------------- -----------
workflowlimit cpu=172800 2-00:00:00
3) Set the QoS for the account "most" which is the default account for
the users:
sudo sacctmgr modify account name=most set qos=workflowlimit
4) Check
$ sacctmgr show assoc format=account,cluster,user,qos
Account Cluster User QOS
---------- ---------- ---------- --------------------
root osorno normal
root osorno root normal
legi osorno normal
most osorno workflowlimit
most osorno begou workflowlimit
5) Modifiy slurm.conf with:
AccountingStorageEnforce=limits,qos
and propagate on the 5 nodes and the front end (done via Ansible)
6) Check
clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep
AccountingStorageEnforce /etc/slurm/slurm.conf'
---------------
osorno,osorno-0-[0-4],osorno-fe (7)
---------------
AccountingStorageEnforce=limits,qos
7) restart slurmd on all the compute nodes and slurmctld + slurmdbd on
the management node.
But I can still request 400 cores for 24 hours:
[begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash
bash-5.1$ squeue
JOBID PARTITION NAME USER ST TIME
START_TIME TIME_LIMIT CPUS NODELIST(REASON)
147 genoa bash begou R 0:03
2025-04-18T16:52:11 1-00:00:00 400 osorno-0-[0-4]
So I must have missed something ?
My partition (I've only one) in slurm.conf is:
PartitionName=genoa State=UP Default=YES MaxTime=48:00:00
DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4]
Thanks
Patrick
[View Less]
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK
(version 25.1). I run it on a compute node by ssh-ing to the node. It runs
in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is copied
…
[View More]anywhere. I'm king of at a loss to figure out why. Any suggestions of where
to look?
Thanks!
Jeff
[View Less]
Hi all, I'm trying to clean up and reconfigure fair share on a slurm 20.11.9 production cluster after some trial and error before I started looking into it. I don't know the full story and need to pick up here. Fair share is enabled with default settings and not customised. It looks a bit like it was enabled by accident and relevant options are undefined/at default.
As a first step I would like to understand the current state. Executing sshare includes output like this (output shortened but …
[View More]complete for the two relevant accounts):
Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare
-------- ---------- ------------ ---------- ----------- ----------- ------------- ----------
root 0.000000 1002501 1.000000
.root root 1 0.083333 0 0.000000 1.000000
.A1 1 0.083333 0 0.000000
..A1 U1 P1 1 0.166667 0 0.000000 0.873684
..A1 U2 P1 1 0.166667 0 0.000000 0.873684
..A1 U3 P1 1 0.166667 0 0.000000 0.873684
..A1 U3 P2 1 0.166667 0 0.000000 0.873684
..A1 U4 P3 1 0.166667 0 0.000000 0.873684
..A1 U4 P1 1 0.166667 0 1.000000 0.821053 <== What is going on here??
.A2 1 0.083333 0 0.000000
..A2 U5 P2 1 0.142857 0 0.000000 1.000000
..A2 U5 P1 1 0.142857 0 0.000000 1.000000
..A2 U6 P4 1 0.142857 0 0.000000 1.000000
..A2 U6 P1 1 0.142857 0 0.000000 1.000000
..A2 U6 P2 1 0.142857 0 0.000000 1.000000
..A2 U7 P1 1 0.142857 0 0.000000 1.000000
..A2 U7 P2 1 0.142857 0 0.000000 1.000000
User U4 is not a member of any other account.
I understand everything about this output except the line I marked. Both accounts A1 and A2 have zero usage, yet, for user U4 on partition P1 we have effective usage 1.0, screwing the fair share factors up for everyone in this account. As far as I can tell, both accounts should look identical with a fair share factor of 1 for every user.
I'm grateful for any pointer for what to look for.
Best regards,
Frank
[View Less]
Happy Monday everybody,
I've gotten a request to have Slurm notify users for the typical email
things (job started, completed, failed, etc) with a REST API instead of
email. This would allow notifications in MS Teams, Slack, or log stuff in
some internal websites and things like that.
As far as I can tell, Slurm does not support that, for example there was
somebody who was looking for that on Galaxy and did not find a solution:
https://help.galaxyproject.org/t/web-hook-post-to-external-url-…
[View More]when-job-beg…
Is that indeed the case, as searching the web indicates?
If Slurm does not support this, is there a workaround? For example, I'm
thinking of installing a local SMTP server, or an alternative/dummy mailx
program which instead of relaying emails as requested would post an
encrypted web url thing using the information from the email. I am sure I
could actually write such a software myself, but I don't have enough time
to dedicate to the design, maintenance and debugging of such, so I am
looking for something decent already in existence. A cursory web search did
not find anything suitable, but perhaps I did not look in the appropriate
places, because my gut feeling is that somebody must have already had such
an itch to scratch!
Any other ideas about alternative ways to accomplish this?
Thanks
[View Less]