- slurm-users - lists.schedmd.com

mariadb refusing access
by Steven Jones 05 Mar '25

05 Mar '25

In the logs I am seeing, root@vuwunicoslurmd3 mariadb]# tail -f mariadb.log 2025-03-04 19:01:32 12565 [Warning] Access denied for user 'slurm'@'localhost' (using password: YES) 2025-03-04 19:06:19 12566 [Warning] Access denied for user 'slurm'@'localhost' (using password: YES) However mysql -u slurm -p works just fine so it seems to be a config error for slurmdbd Any ideas please? regards Steven

4 6

Tracking costs - variable costs per partition
by Jeffrey A Dusenberry 05 Mar '25

05 Mar '25

Hello - We're in a similar situation as was described here: https://groups.google.com/g/slurm-users/c/eBDslkwoFio where we want to track (and control) costs on a fairly heterogenous system with different billing weights per partition. The solution proposed seems like it would work rather well, except our use of fairshare seems to interfere with the billing values we would want to use to limit usage based on credits granted. We have PriorityDecayHalfLife set on our system, so that billing value (GrpTRESRaw) seems to drop with time. Is there a way to implement something similar on an otherwise fairshare-based system? Thanks, Jeff

1 0

orted errors
by Berg, Stephen P CIV USN NRL DET SSC MS (USA) 04 Mar '25

04 Mar '25

I'm running a small-ish slurm grid, 87 nodes with various hardware. On a few occasions lately users submitting jobs will get an orted error and the job fails. Try again a few hours later or the next day and the same job runs just fine. Google-fu indicated it might be a DNS issue if for whatever reason a node couldn't figure out the address for other nodes in the job. So I populated the /etc/hosts on each node with a complete listing of all the nodes so there wouldn't be any reliance on DNS. And that very afternoon another job failed with orted. So it seems at least in my case DNS isn't the issue. What's the best way to troubleshoot this when orted fails but doesn't give any sort of error to indicate what the root cause of the failure might be? And I also can't predictably induce the failure, just have to wait until it randomly chokes.

2 1

Slurm versions 24.11.2 and 24.05.6 are now available
by Marshall Garey 04 Mar '25

04 Mar '25

We are pleased to announce the availability of Slurm versions 24.11.2 and 24.05.6. 24.11.2 fixes a variety of minor to major bugs. Fixed regressions include loading non-default QOS on pending jobs from pre-24.11 state, pending jobs displaying QOS=(null) when not explicitly requesting a QOS, running jobs that requested multiple partitions potentially having an incorrect partition when slurmctld is restarted, and burst_buffer.lua failing if slurm.conf is in a non-standard location. This release also fixes a few crashes in slurmctld: crashing when a job that can preempt requests --test-only, crasing when the scheduler evaluates a job on nodes with suspended jobs, and crashing due to a long-standing bug causing a job record without job_resrcs. 24.05.6 fixes sattach with auth/slurm, a slurmrestd crash when using data_parser/v0.0.40, a slurmctld crash when using job suspension, a performance regression for RPCs with large amounts of data, and some other moderate severity bugs. Downloads are available at https://www.schedmd.com/downloads.php . -- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support > * Changes in Slurm 24.11.2 > ========================== > -- Fix segfault when submitting --test-only jobs that can preempt. > -- Fix regression introduced in 23.11 that prevented the following > flags from being added to a reservation on an update: > DAILY, HOURLY, WEEKLY, WEEKDAY, and WEEKEND. > -- Fix crash and issues evaluating job's suitability for running in > nodes with already suspended job(s) there. > -- Slurmctld will ensure that healthy nodes are not reported as > UnavailableNodes in job reason codes. > -- Fix handling of jobs submitted to a current reservation with > flags OVERLAP,FLEX or OVERLAP,ANY_NODES when it overlaps nodes with a > future maintenance reservation. When a job submission had a time limit that > overlapped with the future maintenance reservation, it was rejected. Now > the job is accepted but stays pending with the reason "ReqNodeNotAvail, > Reserved for maintenance". > -- pam_slurm_adopt - avoid errors when explicitly setting > some arguments to the default value. > -- Fix qos preemption with PreemptMode=SUSPEND > -- slurmdbd - When changing a user's name update lineage > at the same time. > -- Fix regression in 24.11 in which burst_buffer.lua does not > inherit the SLURM_CONF environment variable from slurmctld and fails to run > if slurm.conf is in a non-standard location. > -- Fix memory leak in slurmctld if select/linear and the > PreemptParameters=reclaim_licenses options are both set in slurm.conf. > Regression in 24.11.1. > -- Fix running jobs, that requested multiple partitions, from > potentially being set to the wrong partition on restart. > -- switch/hpe_slingshot - Fix compatibility with newer cxi > drivers, specifically when specifying disable_rdzv_get. > -- Add ABORT_ON_FATAL environment variable to capture a backtrace > from any fatal() message. > -- Fix printing invalid address in rate limiting log statement. > -- sched/backfill - Fix node state PLANNED not being cleared from > fully allocated nodes during a backfill cycle. > -- select/cons_tres - Fix future planning of jobs with bf_licenses. > -- Prevent redundant "on_data returned rc: Rate limit exceeded, > please retry momentarily" error message from being printed in > slurmctld logs. > -- Fix loading non-default QOS on pending jobs from pre-24.11 state. > -- Fix pending jobs displaying QOS=(null) when not explicitly > requesting a QOS. > -- Fix segfault issue from job record with no job_resrcs > -- Fix failing sacctmgr delete/modify/show account operations > with where clauses. > -- Fix regression in 24.11 in which Slurm daemons started catching > several SIGTSTP, SIGTTIN and SIGUSR1 signals and ignored them, while before > they were not ignoring them. This also caused slurmctld to not being > able to shutdown after a SIGTSTP because slurmscriptd caught the signal > and stopped while slurmctld ignored it. Unify and fix these situations and > get back to the previous behavior for these signals. > -- Document that SIGQUIT is no longer ignored by slurmctld, > slurmdbd, and slurmd in 24.11. As of 24.11.0rc1, SIGQUIT is identical to > SIGINT and SIGTERM for these daemons, but this change was not documented. > -- Fix not considering nodes marked for reboot without ASAP > in the scheduler. > -- Remove the boot^ state on unexpected node reboot after > return to service. > -- Do not allow new jobs to start on a node which is being rebooted > with the flag nextstate=resume. > -- Prevent lower priority job running after cancelling an ASAP reboot. > -- Fix srun jobs starting on nextstate=resume rebooting nodes. > > * Changes in Slurm 24.05.6 > ========================== > -- data_parser/v0.0.40 - Prevent a segfault in the slurmrestd when > dumping data with v0.0.40+complex data parser. > -- Fix sattach when using auth/slurm. > -- scrun - Add support '--all' argument for kill subcommand. > -- Fix performance regression while packing larger RPCs. > -- Fix crash and issues evaluating job's suitability for running in > nodes with already suspended job(s) there. > -- Fixed a job requeuing issue that merged job entries into the > same SLUID when all nodes in a job failed simultaneously. > -- switch/hpe_slingshot - Fix compatibility with newer cxi > drivers, specifically when specifying disable_rdzv_get. > -- Add ABORT_ON_FATAL environment variable to capture a backtrace > from any fatal() message.

7 6

Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
by taleintervenor＠sjtu.edu.cn 27 Feb '25

27 Feb '25

Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <dani(a)letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users(a)lists.schedmd.com 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld) (backup slurmctld) | \-------------------------------/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD | DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side?

4 10

Limit CPUs per job (but not per user, partition or node)
by Herbert Fruchtl 26 Feb '25

26 Feb '25

We have a cluster with multi-core nodes (168) that can be shared by multiple jobs at the same time. How do I configure a partition such that it only accepts jobs requesting up to (say) 8 cores, but will run multiple jobs at the same time? The following is apparently not working: PartitionName=debug Nodes=node01 MaxTime=02:00:00 DefMemPerCPU=1000 MaxCPUsPerNode=8 Default=NO It allows one job using 8 cores, but a second one will not start because the limit is apparently for the partition as a whole. Thanks in advance, Herbert -- Herbert Fruchtl (he/him) Senior Scientific Computing Officer / HPC Administrator School of Chemistry, IT Services University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532

3 2

Re: MinTRES in QoS and power saving
by Patryk Bełzak 26 Feb '25

26 Feb '25

Hi, there was this issue raised some time ago: https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg10799.html We're experiencing exactly the same issue now with GPU nodes in power saving, some (but not all) jobs doesn't start because of that, and it's annoying users - badly. Anyone experienced something similiar? Or maybe Stefan has a solution? (we're on version 24.05.5) Best regards Patryk Belzak. -- HPC Administrator Wroclaw Centre for Networking and Supercomputing

1 0

Shard conf weirdness
by Reed Dier 24 Feb '25

24 Feb '25

Hoping someone can help me pin down the weirdness I’m experiencing. There are actually two issues, I’ve run into, the root issue, and then something odd when trying to work around the root issue. v23.11.10 - Ubuntu 22.04 - slurm-smd debs built <https://slurm.schedmd.com/archive/slurm-23.11.10/quickstart_admin.html#debu…> from the tarball. I have 2 slurmctld daemons, and 1 slurmdbd daemon. The slurm.conf is consistent across the cluster, all ctld,dbd,slurmd, have the same shasum hash. The backup slurmctld does not appear to like the gres configured on my gpu nodes, despite the primary slurmctld having no issues. When failing over with scontrol takeover, I get the following log messages on the secondary slurmctld, where it complains about a reported, but not configured typed gres/shard. > [2025-02-21T05:02:02.017] error: Setting node $HOST2 state to INVAL with reason:gres/shard type (p100) reported but not configured > [2025-02-21T05:02:02.018] drain_nodes: node $HOST2 state set to DRAIN > [2025-02-21T05:02:02.018] error: _slurm_rpc_node_registration node=$HOST2: Invalid argument > [2025-02-21T05:02:02.020] error: Setting node $HOST3 state to INVAL with reason:gres/shard type (p40) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST3 state set to DRAIN > [2025-02-21T05:02:02.020] error: _slurm_rpc_node_registration node=$HOST3: Invalid argument > [2025-02-21T05:02:02.023] error: Setting node $HOST1 state to INVAL with reason:gres/shard type (t4) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST1 state set to DRAIN > [2025-02-21T05:02:02.023] error: _slurm_rpc_node_registration node=$HOST1: Invalid argument And looking at those hosts in the slurm.conf, the shards are not typed, but generic. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:16 I should also point out at this point that my gres.conf is just AutoDetect=nvml. I am not explicitly mapping any devices. So this is problem 1, the slurmctld behaving differently between daemons, where the primary has zero issue with the configuration, but the secondary balking and draining the nodes consistently. Moving on to problem 2, which is me trying to solve the issue, and running into a different issue. I decided to try to add the typed gres/shards to both the AccountingStorageTRES, and the NodeName lists. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:t4:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:p100:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:p40:8,shard:p100:8 Now the secondary slurmctld is happy and no longer immediately drains the gpu nodes for the invalid gres. However, the node ($HOST3) with a mixed set of gpus is now complaining. > [2025-02-23T21:41:05.311] gpu/nvml: _get_system_gpu_list_nvml: 4 GPU system device(s) detected > [2025-02-23T21:41:05.311] fatal: _build_shared_list: bad configuration, multiple configurations without "File" No matter what I tried, any line of the NodeName with multiple shard:$type:$num generates the _build_shared_list error above. I tried re-ordering the list as gpu,shard,gpu,shard to no avail, I tried having untyped shards, with typed shards after that, but then it complained about too many shards (double the intended number, $untyped + $typed = too many) So my current solution is to have the slurm conf in sync everywhere BUT the host(s) with multiple gpu models in the same host, where I had to revert to untyped gres on those slurmd’s, but on the slurmctlds they are typed gres. Hopefully I’ve done a decent job of explaining the corner case, enough that someone can point me in the direction of figuring out whats going on, and what is the “correct” way of doing this. I tried increasing the slurmd and slurmctld logging to debug2, to nothing that stood out beyond what was already gathered. > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p100 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p40 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] fatal: _build_shared_list: bad configuration, multiple configurations without "File" Any ideas are greatly appreciated, Reed

1 0

Plese help [CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1]
by Hugo Solís 24 Feb '25

24 Feb '25

I am installing slurm on a small cluster of 3 identical computers at the university. It turns out that I get the message when starting: error: NodeNames=hugo-big-1 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-2 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-3 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. if I run sudo slurmd -C I get NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 reading many post it seems there is a bug starting in version 18. However I had modify the slurm.conf file NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 Gres=gpu:nvidia_geforce_rtx_3090:2 to several different version where I do not specify CPUs or sockets and I failed everytime. I do need help, any idea? warm regards Hugo Solis BTW. if I run lscpu I got that any product of Thread, cores or sockets do not result in 24. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i9-12900K CPU family: 6 Model: 151 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 Stepping: 2 CPU max MHz: 5200.0000 CPU min MHz: 800.0000 BogoMIPS: 6374.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_ti mer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_e pp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 640 KiB (16 instances) L1i: 768 KiB (16 instances) L2: 14 MiB (10 instances) L3: 30 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-23

2 2

Please help - Building Slurm-24.11.1 Failed
by Zhang, Yuan 23 Feb '25

23 Feb '25

Hello, I got errors about missing perl modules when building slurm24.11.1 rpm packages. Has anyone seen this error before? And how to fix it? Here are the error messages: -- Processing files: slurm-perlapi-24.11.1-1.el8.x86_64 error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix RPM build errors: Macro expanded in comment on line 31: %_prefix path install path for commands, libraries, etc. Macro expanded in comment on line 240: %define _unpackaged_files_terminate_build 0 File listed twice: /usr/lib/.build-id/49/9394dac8b96530e1a29d0e35608ef9acedec70 File listed twice: /usr/lib/.build-id/88/4e3a854911920ae0c77c4b32d5af70dad538a9 File listed twice: /usr/lib/.build-id/c1/72dbf51fe1c17868a4e0dbf0aa55a7923f6d02 File listed twice: /usr/lib/.build-id/dc/67c93438d36986971444d218350d27b413c2cb File listed twice: /usr/lib/.build-id/ee/e79f70a62bf62e74705a75daa87d26857452ef Deprecated external dependency generator is used! File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix -- Thanks! Yuan -- Yuan Zhang Senior HPC System Programmer Nationwide Children’s Hospital Research Institute Columbus, Ohio 43215

3 5

2025

2024