- slurm-users - lists.schedmd.com

Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
by taleintervenor＠sjtu.edu.cn 27 Feb '25

27 Feb '25

Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <dani(a)letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users(a)lists.schedmd.com 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld) (backup slurmctld) | \-------------------------------/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD | DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side?

4 10

Limit CPUs per job (but not per user, partition or node)
by Herbert Fruchtl 26 Feb '25

26 Feb '25

We have a cluster with multi-core nodes (168) that can be shared by multiple jobs at the same time. How do I configure a partition such that it only accepts jobs requesting up to (say) 8 cores, but will run multiple jobs at the same time? The following is apparently not working: PartitionName=debug Nodes=node01 MaxTime=02:00:00 DefMemPerCPU=1000 MaxCPUsPerNode=8 Default=NO It allows one job using 8 cores, but a second one will not start because the limit is apparently for the partition as a whole. Thanks in advance, Herbert -- Herbert Fruchtl (he/him) Senior Scientific Computing Officer / HPC Administrator School of Chemistry, IT Services University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532

3 2

Re: MinTRES in QoS and power saving
by Patryk Bełzak 26 Feb '25

26 Feb '25

Hi, there was this issue raised some time ago: https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg10799.html We're experiencing exactly the same issue now with GPU nodes in power saving, some (but not all) jobs doesn't start because of that, and it's annoying users - badly. Anyone experienced something similiar? Or maybe Stefan has a solution? (we're on version 24.05.5) Best regards Patryk Belzak. -- HPC Administrator Wroclaw Centre for Networking and Supercomputing

1 0

Shard conf weirdness
by Reed Dier 24 Feb '25

24 Feb '25

Hoping someone can help me pin down the weirdness I’m experiencing. There are actually two issues, I’ve run into, the root issue, and then something odd when trying to work around the root issue. v23.11.10 - Ubuntu 22.04 - slurm-smd debs built <https://slurm.schedmd.com/archive/slurm-23.11.10/quickstart_admin.html#debu…> from the tarball. I have 2 slurmctld daemons, and 1 slurmdbd daemon. The slurm.conf is consistent across the cluster, all ctld,dbd,slurmd, have the same shasum hash. The backup slurmctld does not appear to like the gres configured on my gpu nodes, despite the primary slurmctld having no issues. When failing over with scontrol takeover, I get the following log messages on the secondary slurmctld, where it complains about a reported, but not configured typed gres/shard. > [2025-02-21T05:02:02.017] error: Setting node $HOST2 state to INVAL with reason:gres/shard type (p100) reported but not configured > [2025-02-21T05:02:02.018] drain_nodes: node $HOST2 state set to DRAIN > [2025-02-21T05:02:02.018] error: _slurm_rpc_node_registration node=$HOST2: Invalid argument > [2025-02-21T05:02:02.020] error: Setting node $HOST3 state to INVAL with reason:gres/shard type (p40) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST3 state set to DRAIN > [2025-02-21T05:02:02.020] error: _slurm_rpc_node_registration node=$HOST3: Invalid argument > [2025-02-21T05:02:02.023] error: Setting node $HOST1 state to INVAL with reason:gres/shard type (t4) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST1 state set to DRAIN > [2025-02-21T05:02:02.023] error: _slurm_rpc_node_registration node=$HOST1: Invalid argument And looking at those hosts in the slurm.conf, the shards are not typed, but generic. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:16 I should also point out at this point that my gres.conf is just AutoDetect=nvml. I am not explicitly mapping any devices. So this is problem 1, the slurmctld behaving differently between daemons, where the primary has zero issue with the configuration, but the secondary balking and draining the nodes consistently. Moving on to problem 2, which is me trying to solve the issue, and running into a different issue. I decided to try to add the typed gres/shards to both the AccountingStorageTRES, and the NodeName lists. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:t4:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:p100:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:p40:8,shard:p100:8 Now the secondary slurmctld is happy and no longer immediately drains the gpu nodes for the invalid gres. However, the node ($HOST3) with a mixed set of gpus is now complaining. > [2025-02-23T21:41:05.311] gpu/nvml: _get_system_gpu_list_nvml: 4 GPU system device(s) detected > [2025-02-23T21:41:05.311] fatal: _build_shared_list: bad configuration, multiple configurations without "File" No matter what I tried, any line of the NodeName with multiple shard:$type:$num generates the _build_shared_list error above. I tried re-ordering the list as gpu,shard,gpu,shard to no avail, I tried having untyped shards, with typed shards after that, but then it complained about too many shards (double the intended number, $untyped + $typed = too many) So my current solution is to have the slurm conf in sync everywhere BUT the host(s) with multiple gpu models in the same host, where I had to revert to untyped gres on those slurmd’s, but on the slurmctlds they are typed gres. Hopefully I’ve done a decent job of explaining the corner case, enough that someone can point me in the direction of figuring out whats going on, and what is the “correct” way of doing this. I tried increasing the slurmd and slurmctld logging to debug2, to nothing that stood out beyond what was already gathered. > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p100 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p40 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] fatal: _build_shared_list: bad configuration, multiple configurations without "File" Any ideas are greatly appreciated, Reed

1 0

Plese help [CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1]
by Hugo Solís 24 Feb '25

24 Feb '25

I am installing slurm on a small cluster of 3 identical computers at the university. It turns out that I get the message when starting: error: NodeNames=hugo-big-1 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-2 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-3 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. if I run sudo slurmd -C I get NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 reading many post it seems there is a bug starting in version 18. However I had modify the slurm.conf file NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 Gres=gpu:nvidia_geforce_rtx_3090:2 to several different version where I do not specify CPUs or sockets and I failed everytime. I do need help, any idea? warm regards Hugo Solis BTW. if I run lscpu I got that any product of Thread, cores or sockets do not result in 24. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i9-12900K CPU family: 6 Model: 151 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 Stepping: 2 CPU max MHz: 5200.0000 CPU min MHz: 800.0000 BogoMIPS: 6374.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_ti mer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_e pp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 640 KiB (16 instances) L1i: 768 KiB (16 instances) L2: 14 MiB (10 instances) L3: 30 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-23

2 2

Please help - Building Slurm-24.11.1 Failed
by Zhang, Yuan 23 Feb '25

23 Feb '25

Hello, I got errors about missing perl modules when building slurm24.11.1 rpm packages. Has anyone seen this error before? And how to fix it? Here are the error messages: -- Processing files: slurm-perlapi-24.11.1-1.el8.x86_64 error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix RPM build errors: Macro expanded in comment on line 31: %_prefix path install path for commands, libraries, etc. Macro expanded in comment on line 240: %define _unpackaged_files_terminate_build 0 File listed twice: /usr/lib/.build-id/49/9394dac8b96530e1a29d0e35608ef9acedec70 File listed twice: /usr/lib/.build-id/88/4e3a854911920ae0c77c4b32d5af70dad538a9 File listed twice: /usr/lib/.build-id/c1/72dbf51fe1c17868a4e0dbf0aa55a7923f6d02 File listed twice: /usr/lib/.build-id/dc/67c93438d36986971444d218350d27b413c2cb File listed twice: /usr/lib/.build-id/ee/e79f70a62bf62e74705a75daa87d26857452ef Deprecated external dependency generator is used! File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix -- Thanks! Yuan -- Yuan Zhang Senior HPC System Programmer Nationwide Children’s Hospital Research Institute Columbus, Ohio 43215

3 5

Running SLURM in a laptop
by Antonio Rius 19 Feb '25

19 Feb '25

For testing purposes I am using SLURM installed in a lenovo laptop running ubuntu 24-04. I am trying this configuration as part of my slurm learning process before using a real HPC facility. In my trial I want to submit an array of 9 independent processes. The CPU information of such machine is: Total Logical CPUs: 16 Physical Cores: 12 Threads per Core: 2 ================================= and the slurm script (attached) that I am using contain the sentences : #SBATCH --job-name=process_pkl #SBATCH --array=1-9 #SBATCH --output=/tmp/process_pkl_%A_%a.out #SBATCH --error=/tmp/process_pkl_%A_%a.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=2G =================================== but in practice, using htop, only 4 processes are running concurrently, and there are other 12 CPUs without activity. My question is: What am I missing in my setup?

4 3

slurmrestd health check
by Hagdorn, Magnus Karl Moritz 19 Feb '25

19 Feb '25

Hi there, we use haproxy to distribute SLURM REST API requests to multiple instances of slurmrestd. For the haproxy we need a health check. At the moment we are just checking that we get a 401 status. This works but we are ending up with a lot of noise in the log files. It would be very nice if there was an unauthenticated REST API endpoint that can be used to check the daemon is up. Has anybody solved this issue? Refards magnus -- Dr. Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing

1 0

/etc/passwd sync?
by mark.w.moorcroft＠ama-inc.com 19 Feb '25

19 Feb '25

If you set up slurm elastic cloud in EC2 without LDAP, what is the recommended method for sync of the passwd/group files? Is this necessary to get openmpi jobs to run. I would swear I had this working last week without synced passwd on two nodes. But thinking about it now I'm not sure how this could have worked. My home directories are in an NFS mount, but the user accounts don't exist on the node AMI. I'm using ansible/packer to manage the AMI's. When I ran OpenHPC / Slurm on bare metal there was a sync process. This is my first AWS Slurm cluster rodeo. I can't use the Amazon Parallel Computing tools because we are forced to be in GovCloud. I started with "ClusterInTheCloud", but it's all 4 years old, and semi-broken out of the box. My manager had me ditch a lot of it (including LDAP). So I'm building out a fork that is getting heavily modded for our situation. An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity

4 4

how to set slurmdbd.conf if using two slurmdb node with HA database?
by taleintervenor＠sjtu.edu.cn 19 Feb '25

19 Feb '25

The deployment scenario is as follows: nodeA nodeB (slurmctld) (backup slurmctld) | \-------------------------------/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options "DbdAddr", "DbdHost" and "DbdBackupHost". Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD | DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don't use the "DbdBackupHost" like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I'm quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains "AccountingStorageHost" and "AccountingStorageBackupHost", why we need set backupdbd again on slurmdbd side?

2 1

2025

2024