- slurm-users - lists.schedmd.com

Slurm upgrade using Debian packages
by Matthias Leopold 10 Mar '25

10 Mar '25

Hi, I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) slurmctld (c) slurmd separately in this order, stopping each service for the upgrade. How can I follow this when the Debian packages have a dependency between slurmdbd + slurmctld that upgrades both packages at the same time? thx Matthias

4 7

allowing spyder-kernels to an interactive session without pam_slurm_adopt and older version of Slurm from OpenHPC repo; parmiko?
by Robert Kudyba 06 Mar '25

06 Mar '25

So the Spyder IDE has this great feature to connect a local/laptop/workstation to a spyder-kernels session <https://docs.spyder-ide.org/current/panes/ipythonconsole.html#using-externa…>. If a user starts an interactive, e.g., srun session, they end up on a compute node which is on one of our older clusters without the pam_slurm_adopt module. Is there a way with a library like Parmiko or this slurmjob <https://github.com/daangeijs/deepops-slurmjob> package? I see for vscode an admin posted the following option <https://github.com/microsoft/vscode-remote-release/issues/1722#issuecomment…> : if [[ -v SSH_AUTH_SOCK ]]; then if [[ ${SSH_AUTH_SOCK} =~ vscode ]]; then [ -f ~/.code-tunnel-env.bash ] && source ~/.code-tunnel-env.bash fi fi Is there a way to work around the restriction of non admins to connect to the spyder-kernel session? Is there anything similar to batchspawner for Jupyterhub <https://github.com/jupyterhub/batchspawner>? Unless anyone knows of an archive for Slurm slurm-ohpc-18.08 that includes pam_adopt?

1 0

Slurm versions 24.11.3 is now available
by Marshall Garey 06 Mar '25

06 Mar '25

We are pleased to announce the availability of Slurm version 24.11.3. 24.11.3 fixes the database cluster ID generation not being random, a regression in which slurmd -G gave no output, a long-standing crash in slurmctld after updating a reservation with an empty nodelist, and some other minor to moderate bugs. Downloads are available at https://www.schedmd.com/downloads.php . -- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support > * Changes in Slurm 24.11.3 > ========================== > -- Fix race condition in slurmrestd that resulted in "Requested > data_parser plugin does not support OpenAPI plugin" error being returned > for valid endpoints. > -- If multiple partitions are requested, set the SLURM_JOB_PARTITION > output environment variable to the partition in which the job is running > for salloc and srun in order to match the documentation and the behavior of > sbatch. > -- Fix regression where slurmd -G gives no output. > -- Don't print misleading errors for stepmgr enabled steps. > -- slurmrestd - Avoid connection to slurmdbd for the following > endpoints: > GET /slurm/v0.0.41/jobs > GET /slurm/v0.0.41/job/{job_id} > -- slurmrestd - Avoid connection to slurmdbd for the following > endpoints: > GET /slurm/v0.0.40/jobs > GET /slurm/v0.0.40/job/{job_id} > -- Significantly increase entropy of clusterid portion of the > sluid by seeding the random number generator > -- Avoid changing process name to "watch" from original daemon name. > This could potentially breaking some monitoring scripts. > -- Avoid slurmctld being killed by SIGALRM due to race condition > at startup. > -- Fix slurmctld crash when after updating a reservation with an empty > nodelist. The crash could occur after restarting slurmctld, or if > downing/draining a node in the reservation with the REPLACE or REPLACE_DOWN > flag. > -- Fix race between task/cgroup cpuset and jobacctgather/cgroup. > The first was removing the pid from task_X cgroup directory causing > memory limits to not be applied. > -- srun - Fixed wrongly constructed SLURM_CPU_BIND env variable > that could get propagated to downward srun calls in certain mpi > environments, causing launch failures. > -- slurmrestd - Fix possible memory leak when parsing arrays with > data_parser/v0.0.40. > -- slurmrestd - Fix possible memory leak when parsing arrays with > data_parser/v0.0.41. > -- slurmrestd - Fix possible memory leak when parsing arrays with > data_parser/v0.0.42.

1 0

broken SLURM-PMIX out-of-band communication on v24.11.0 with PMIx v5
by Bertini, Denis Dr. 06 Mar '25

06 Mar '25

Hi there, Using slurm v24.11.0 together with openMPI 5.0.7 built with openpmix v5.0.6 i am facing a systematical crash at process wiring-up phase when launching standard MPI job (OSU benchmarks ) on our new AMD compute nodes ( amd-epyc 9654, 192 phys. cores +HT ) running Rocky Linux 9.4 OS The typical error reads: slurmstepd: error: mpi/pmix_v5: pmixp_p2p_send: ccexe0094 [4]: pmixp_utils.c:469: send failed, rc=1001, exceeded the retry limit slurmstepd: error: mpi/pmix_v5: _slurm_send: ccexe0094 [4]: pmixp_server.c:1581: Cannot send message to /var/spool/slurmd/stepd.slurm.pmix.656.0, size = 46979, hostlist: (null) srun: error: Node failure on ccexe0091 after such a error as you can see the node move to state down It looks like the slurmstep pmix_server can not use the local socket at var/spool/slurmd/stepd.slurm.pmix.job_id.0 for inter-node communication . * On one AMD node ( same SLURM version, same cluster setup ) wiring up works smoothly even at core satuation (192 cores used) * On Intel node (intel,xeon,gold6248r, 48 cores ) wiring-up works even with multiple node without any problem * When the problematic AMD nodes are setup as dynamic node<https://slurm.schedmd.com/dynamic_nodes.html> the wiring-up phase with multiple nodes works perfectly, without any issue Has anybody experienced this kind of problem? Any idea what could be the reason for that? I also add that when the problematic AMD nodes are setup as dynamic node<https://slurm.schedmd.com/dynamic_nodes.html> the wiring-up phase with multiple nodes works perfectly, without any issue Cheers, Denis * --------- Denis Bertini Abteilung: CIT Ort: SB3 2.265a Tel: +49 6159 71 2240 Fax: +49 6159 71 2986 E-Mail: d.bertini(a)gsi.de GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz

1 0

error in logs, does it mean anything?
by Steven Jones 06 Mar '25

06 Mar '25

[2025-03-06T00:26:06.002] error: _add_registered_cluster: trying to register a cluster (poc-cluster) with no remote port [2025-03-06T00:31:06.002] error: _add_registered_cluster: trying to register a cluster (poc-cluster) with no remote port [2025-03-06T00:36:06.001] error: _add_registered_cluster: trying to register a cluster (poc-cluster) with no remote port [2025-03-06T00:41:07.001] error: _add_registered_cluster: trying to register a cluster (poc-cluster) with no remote port regards Steven

1 0

Re: mariadb refusing access
by Steven Jones 05 Mar '25

05 Mar '25

Yes that is set, and my settings are, # slurmDBD info DbdAddr=localhost DbdHost=vuwunicoslurmd3.ods.vuw.ac.nz #DbdHost=localhost DbdPort=6819 SlurmUser=slurm #MessageTimeout=300 DebugLevel=verbose #DefaultQOS=normal,standby LogFile=/var/log/slurm/slurmdbd.log PidFile=/var/run/slurmdbd/slurmdbd.pid #PluginDir=/usr/lib/slurm #PrivateData=accounts,users,usage,jobs #TrackWCKey=yes # # Database info StorageType=accounting_storage/mysql #StorageHost=localhost StorageHost=localhost #StoragePort=1234 #StoragePort=3306 StoragePass=xxxxxxxx StorageUser=slurm StorageLoc=slurm_acct_db #ssj regards Steven ________________________________ From: Sarlo, Jeffrey S <JSarlo(a)Central.UH.EDU> Sent: Thursday, 6 March 2025 7:42 am To: Steven Jones <steven.jones(a)vuw.ac.nz> Subject: RE: [slurm-users] Re: mariadb refusing access You don't often get email from jsarlo(a)central.uh.edu. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Do you have the correct password listed in the slurmdbd.conf file? That was toward the bottom of the Wiki that Ole sent. Jeff From: Steven Jones via slurm-users <slurm-users(a)lists.schedmd.com> Sent: Wednesday, March 5, 2025 12:39 PM To: slurm-users(a)lists.schedmd.com; Ole Holm Nielsen <Ole.H.Nielsen(a)fysik.dtu.dk> Subject: [slurm-users] Re: mariadb refusing access Sorry, I dont follow you, Pass word works fine and grant looks OK. [root@vuwunicoslurmd3 ~]# mysql -u slurm -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 208 Server version: 10.11.10-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show grants; +--------------------------------------------------------------------------------------------------------------+ | Grants for slurm@localhost | +--------------------------------------------------------------------------------------------------------------+ | GRANT USAGE ON *.* TO `slurm`@`localhost` IDENTIFIED BY PASSWORD '*2F4EEC707189D8B5E329829B9303058EF7585569' | | GRANT ALL PRIVILEGES ON `slurm_acct_db`.* TO `slurm`@`localhost` | +--------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.000 sec) MariaDB [(none)]> Isnt there something about password encryption / password hash? regards Steven ________________________________ From: Ole Holm Nielsen via slurm-users <slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> Sent: Wednesday, 5 March 2025 9:25 pm To: slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> <slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> Subject: [slurm-users] Re: mariadb refusing access On 3/5/25 02:23, Steven Jones via slurm-users wrote: > In the logs I am seeing, > > root@vuwunicoslurmd3 mariadb]# tail -f mariadb.log > 2025-03-04 19:01:32 12565 [Warning] Access denied for user > 'slurm'@'localhost' (using password: YES) > 2025-03-04 19:06:19 12566 [Warning] Access denied for user > 'slurm'@'localhost' (using password: YES) > > However mysql -u slurm -p works just fine so it seems to be a config > error for slurmdbd It seems that you didn't select a suitable slurm user’s database password, see https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.fysi…<https://urldefense.com/v3/__https:/apc01.safelinks.protection.outlook.com/?…> IHTH, Ole -- slurm-users mailing list -- slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com<mailto:slurm-users-leave@lists.schedmd.com>

1 1

mariadb refusing access
by Steven Jones 05 Mar '25

05 Mar '25

In the logs I am seeing, root@vuwunicoslurmd3 mariadb]# tail -f mariadb.log 2025-03-04 19:01:32 12565 [Warning] Access denied for user 'slurm'@'localhost' (using password: YES) 2025-03-04 19:06:19 12566 [Warning] Access denied for user 'slurm'@'localhost' (using password: YES) However mysql -u slurm -p works just fine so it seems to be a config error for slurmdbd Any ideas please? regards Steven

4 6

Tracking costs - variable costs per partition
by Jeffrey A Dusenberry 05 Mar '25

05 Mar '25

Hello - We're in a similar situation as was described here: https://groups.google.com/g/slurm-users/c/eBDslkwoFio where we want to track (and control) costs on a fairly heterogenous system with different billing weights per partition. The solution proposed seems like it would work rather well, except our use of fairshare seems to interfere with the billing values we would want to use to limit usage based on credits granted. We have PriorityDecayHalfLife set on our system, so that billing value (GrpTRESRaw) seems to drop with time. Is there a way to implement something similar on an otherwise fairshare-based system? Thanks, Jeff

1 0

orted errors
by Berg, Stephen P CIV USN NRL DET SSC MS (USA) 04 Mar '25

04 Mar '25

I'm running a small-ish slurm grid, 87 nodes with various hardware. On a few occasions lately users submitting jobs will get an orted error and the job fails. Try again a few hours later or the next day and the same job runs just fine. Google-fu indicated it might be a DNS issue if for whatever reason a node couldn't figure out the address for other nodes in the job. So I populated the /etc/hosts on each node with a complete listing of all the nodes so there wouldn't be any reliance on DNS. And that very afternoon another job failed with orted. So it seems at least in my case DNS isn't the issue. What's the best way to troubleshoot this when orted fails but doesn't give any sort of error to indicate what the root cause of the failure might be? And I also can't predictably induce the failure, just have to wait until it randomly chokes.

2 1

Slurm versions 24.11.2 and 24.05.6 are now available
by Marshall Garey 04 Mar '25

04 Mar '25

We are pleased to announce the availability of Slurm versions 24.11.2 and 24.05.6. 24.11.2 fixes a variety of minor to major bugs. Fixed regressions include loading non-default QOS on pending jobs from pre-24.11 state, pending jobs displaying QOS=(null) when not explicitly requesting a QOS, running jobs that requested multiple partitions potentially having an incorrect partition when slurmctld is restarted, and burst_buffer.lua failing if slurm.conf is in a non-standard location. This release also fixes a few crashes in slurmctld: crashing when a job that can preempt requests --test-only, crasing when the scheduler evaluates a job on nodes with suspended jobs, and crashing due to a long-standing bug causing a job record without job_resrcs. 24.05.6 fixes sattach with auth/slurm, a slurmrestd crash when using data_parser/v0.0.40, a slurmctld crash when using job suspension, a performance regression for RPCs with large amounts of data, and some other moderate severity bugs. Downloads are available at https://www.schedmd.com/downloads.php . -- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support > * Changes in Slurm 24.11.2 > ========================== > -- Fix segfault when submitting --test-only jobs that can preempt. > -- Fix regression introduced in 23.11 that prevented the following > flags from being added to a reservation on an update: > DAILY, HOURLY, WEEKLY, WEEKDAY, and WEEKEND. > -- Fix crash and issues evaluating job's suitability for running in > nodes with already suspended job(s) there. > -- Slurmctld will ensure that healthy nodes are not reported as > UnavailableNodes in job reason codes. > -- Fix handling of jobs submitted to a current reservation with > flags OVERLAP,FLEX or OVERLAP,ANY_NODES when it overlaps nodes with a > future maintenance reservation. When a job submission had a time limit that > overlapped with the future maintenance reservation, it was rejected. Now > the job is accepted but stays pending with the reason "ReqNodeNotAvail, > Reserved for maintenance". > -- pam_slurm_adopt - avoid errors when explicitly setting > some arguments to the default value. > -- Fix qos preemption with PreemptMode=SUSPEND > -- slurmdbd - When changing a user's name update lineage > at the same time. > -- Fix regression in 24.11 in which burst_buffer.lua does not > inherit the SLURM_CONF environment variable from slurmctld and fails to run > if slurm.conf is in a non-standard location. > -- Fix memory leak in slurmctld if select/linear and the > PreemptParameters=reclaim_licenses options are both set in slurm.conf. > Regression in 24.11.1. > -- Fix running jobs, that requested multiple partitions, from > potentially being set to the wrong partition on restart. > -- switch/hpe_slingshot - Fix compatibility with newer cxi > drivers, specifically when specifying disable_rdzv_get. > -- Add ABORT_ON_FATAL environment variable to capture a backtrace > from any fatal() message. > -- Fix printing invalid address in rate limiting log statement. > -- sched/backfill - Fix node state PLANNED not being cleared from > fully allocated nodes during a backfill cycle. > -- select/cons_tres - Fix future planning of jobs with bf_licenses. > -- Prevent redundant "on_data returned rc: Rate limit exceeded, > please retry momentarily" error message from being printed in > slurmctld logs. > -- Fix loading non-default QOS on pending jobs from pre-24.11 state. > -- Fix pending jobs displaying QOS=(null) when not explicitly > requesting a QOS. > -- Fix segfault issue from job record with no job_resrcs > -- Fix failing sacctmgr delete/modify/show account operations > with where clauses. > -- Fix regression in 24.11 in which Slurm daemons started catching > several SIGTSTP, SIGTTIN and SIGUSR1 signals and ignored them, while before > they were not ignoring them. This also caused slurmctld to not being > able to shutdown after a SIGTSTP because slurmscriptd caught the signal > and stopped while slurmctld ignored it. Unify and fix these situations and > get back to the previous behavior for these signals. > -- Document that SIGQUIT is no longer ignored by slurmctld, > slurmdbd, and slurmd in 24.11. As of 24.11.0rc1, SIGQUIT is identical to > SIGINT and SIGTERM for these daemons, but this change was not documented. > -- Fix not considering nodes marked for reboot without ASAP > in the scheduler. > -- Remove the boot^ state on unexpected node reboot after > return to service. > -- Do not allow new jobs to start on a node which is being rebooted > with the flag nextstate=resume. > -- Prevent lower priority job running after cancelling an ASAP reboot. > -- Fix srun jobs starting on nextstate=resume rebooting nodes. > > * Changes in Slurm 24.05.6 > ========================== > -- data_parser/v0.0.40 - Prevent a segfault in the slurmrestd when > dumping data with v0.0.40+complex data parser. > -- Fix sattach when using auth/slurm. > -- scrun - Add support '--all' argument for kill subcommand. > -- Fix performance regression while packing larger RPCs. > -- Fix crash and issues evaluating job's suitability for running in > nodes with already suspended job(s) there. > -- Fixed a job requeuing issue that merged job entries into the > same SLUID when all nodes in a job failed simultaneously. > -- switch/hpe_slingshot - Fix compatibility with newer cxi > drivers, specifically when specifying disable_rdzv_get. > -- Add ABORT_ON_FATAL environment variable to capture a backtrace > from any fatal() message.

7 6

Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
by taleintervenor＠sjtu.edu.cn 27 Feb '25

27 Feb '25

Do you mean the second configuration scheme? I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively. The key point is whether we should set the DbdBackupHost option and how it work? 发件人: Daniel Letai <dani(a)letai.org.il> 发送时间: 2025年2月19日 18:21 收件人: slurm-users(a)lists.schedmd.com 主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database? I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows: nodeA nodeB (slurmctld) (backup slurmctld) | \-------------------------------/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”. Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD | DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don’t use the “DbdBackupHost” like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side?

4 10

Limit CPUs per job (but not per user, partition or node)
by Herbert Fruchtl 26 Feb '25

26 Feb '25

We have a cluster with multi-core nodes (168) that can be shared by multiple jobs at the same time. How do I configure a partition such that it only accepts jobs requesting up to (say) 8 cores, but will run multiple jobs at the same time? The following is apparently not working: PartitionName=debug Nodes=node01 MaxTime=02:00:00 DefMemPerCPU=1000 MaxCPUsPerNode=8 Default=NO It allows one job using 8 cores, but a second one will not start because the limit is apparently for the partition as a whole. Thanks in advance, Herbert -- Herbert Fruchtl (he/him) Senior Scientific Computing Officer / HPC Administrator School of Chemistry, IT Services University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532

3 2

Re: MinTRES in QoS and power saving
by Patryk Bełzak 26 Feb '25

26 Feb '25

Hi, there was this issue raised some time ago: https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg10799.html We're experiencing exactly the same issue now with GPU nodes in power saving, some (but not all) jobs doesn't start because of that, and it's annoying users - badly. Anyone experienced something similiar? Or maybe Stefan has a solution? (we're on version 24.05.5) Best regards Patryk Belzak. -- HPC Administrator Wroclaw Centre for Networking and Supercomputing

1 0

Shard conf weirdness
by Reed Dier 24 Feb '25

24 Feb '25

Hoping someone can help me pin down the weirdness I’m experiencing. There are actually two issues, I’ve run into, the root issue, and then something odd when trying to work around the root issue. v23.11.10 - Ubuntu 22.04 - slurm-smd debs built <https://slurm.schedmd.com/archive/slurm-23.11.10/quickstart_admin.html#debu…> from the tarball. I have 2 slurmctld daemons, and 1 slurmdbd daemon. The slurm.conf is consistent across the cluster, all ctld,dbd,slurmd, have the same shasum hash. The backup slurmctld does not appear to like the gres configured on my gpu nodes, despite the primary slurmctld having no issues. When failing over with scontrol takeover, I get the following log messages on the secondary slurmctld, where it complains about a reported, but not configured typed gres/shard. > [2025-02-21T05:02:02.017] error: Setting node $HOST2 state to INVAL with reason:gres/shard type (p100) reported but not configured > [2025-02-21T05:02:02.018] drain_nodes: node $HOST2 state set to DRAIN > [2025-02-21T05:02:02.018] error: _slurm_rpc_node_registration node=$HOST2: Invalid argument > [2025-02-21T05:02:02.020] error: Setting node $HOST3 state to INVAL with reason:gres/shard type (p40) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST3 state set to DRAIN > [2025-02-21T05:02:02.020] error: _slurm_rpc_node_registration node=$HOST3: Invalid argument > [2025-02-21T05:02:02.023] error: Setting node $HOST1 state to INVAL with reason:gres/shard type (t4) reported but not configured > [2025-02-21T05:02:02.020] drain_nodes: node $HOST1 state set to DRAIN > [2025-02-21T05:02:02.023] error: _slurm_rpc_node_registration node=$HOST1: Invalid argument And looking at those hosts in the slurm.conf, the shards are not typed, but generic. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:16 I should also point out at this point that my gres.conf is just AutoDetect=nvml. I am not explicitly mapping any devices. So this is problem 1, the slurmctld behaving differently between daemons, where the primary has zero issue with the configuration, but the secondary balking and draining the nodes consistently. Moving on to problem 2, which is me trying to solve the issue, and running into a different issue. I decided to try to add the typed gres/shards to both the AccountingStorageTRES, and the NodeName lists. > NodeName=$HOST1 [SNIP] State=UNKNOWN Gres=gpu:t4:2,shard:t4:8 > NodeName=$HOST2 [SNIP] State=UNKNOWN Gres=gpu:p100:2,shard:p100:8 > NodeName=$HOST3 [SNIP] State=UNKNOWN Gres=gpu:p40:2,gpu:p100:2,shard:p40:8,shard:p100:8 Now the secondary slurmctld is happy and no longer immediately drains the gpu nodes for the invalid gres. However, the node ($HOST3) with a mixed set of gpus is now complaining. > [2025-02-23T21:41:05.311] gpu/nvml: _get_system_gpu_list_nvml: 4 GPU system device(s) detected > [2025-02-23T21:41:05.311] fatal: _build_shared_list: bad configuration, multiple configurations without "File" No matter what I tried, any line of the NodeName with multiple shard:$type:$num generates the _build_shared_list error above. I tried re-ordering the list as gpu,shard,gpu,shard to no avail, I tried having untyped shards, with typed shards after that, but then it complained about too many shards (double the intended number, $untyped + $typed = too many) So my current solution is to have the slurm conf in sync everywhere BUT the host(s) with multiple gpu models in the same host, where I had to revert to untyped gres on those slurmd’s, but on the slurmctlds they are typed gres. Hopefully I’ve done a decent job of explaining the corner case, enough that someone can point me in the direction of figuring out whats going on, and what is the “correct” way of doing this. I tried increasing the slurmd and slurmctld logging to debug2, to nothing that stood out beyond what was already gathered. > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p100 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] debug: GRES[shard] Type:p40 Count:8 Cores(88):(null) Links:(null) Flags:HAS_TYPE File:(null) UniqueId:(null) > [2025-02-23T21:46:44.895] fatal: _build_shared_list: bad configuration, multiple configurations without "File" Any ideas are greatly appreciated, Reed

1 0

Plese help [CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1]
by Hugo Solís 24 Feb '25

24 Feb '25

I am installing slurm on a small cluster of 3 identical computers at the university. It turns out that I get the message when starting: error: NodeNames=hugo-big-1 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-2 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. error: NodeNames=hugo-big-3 CPUs=24 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs. if I run sudo slurmd -C I get NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 reading many post it seems there is a bug starting in version 18. However I had modify the slurm.conf file NodeName=hugo-big-1 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=128586 Gres=gpu:nvidia_geforce_rtx_3090:2 to several different version where I do not specify CPUs or sockets and I failed everytime. I do need help, any idea? warm regards Hugo Solis BTW. if I run lscpu I got that any product of Thread, cores or sockets do not result in 24. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i9-12900K CPU family: 6 Model: 151 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 Stepping: 2 CPU max MHz: 5200.0000 CPU min MHz: 800.0000 BogoMIPS: 6374.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_ti mer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_e pp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 640 KiB (16 instances) L1i: 768 KiB (16 instances) L2: 14 MiB (10 instances) L3: 30 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-23

2 2

Please help - Building Slurm-24.11.1 Failed
by Zhang, Yuan 23 Feb '25

23 Feb '25

Hello, I got errors about missing perl modules when building slurm24.11.1 rpm packages. Has anyone seen this error before? And how to fix it? Here are the error messages: -- Processing files: slurm-perlapi-24.11.1-1.el8.x86_64 error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so error: File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix RPM build errors: Macro expanded in comment on line 31: %_prefix path install path for commands, libraries, etc. Macro expanded in comment on line 240: %define _unpackaged_files_terminate_build 0 File listed twice: /usr/lib/.build-id/49/9394dac8b96530e1a29d0e35608ef9acedec70 File listed twice: /usr/lib/.build-id/88/4e3a854911920ae0c77c4b32d5af70dad538a9 File listed twice: /usr/lib/.build-id/c1/72dbf51fe1c17868a4e0dbf0aa55a7923f6d02 File listed twice: /usr/lib/.build-id/dc/67c93438d36986971444d218350d27b413c2cb File listed twice: /usr/lib/.build-id/ee/e79f70a62bf62e74705a75daa87d26857452ef Deprecated external dependency generator is used! File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Bitstr.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Constant.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/Slurm/Hostlist.pm File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurm/Slurm.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/Slurmdb.so File not found: /home/gdhpcgroup/yxz006/rpmbuild/BUILDROOT/slurm-24.11.1-1.el8.x86_64/usr/lib64/perl5/auto/Slurmdb/autosplit.ix -- Thanks! Yuan -- Yuan Zhang Senior HPC System Programmer Nationwide Children’s Hospital Research Institute Columbus, Ohio 43215

3 5

Running SLURM in a laptop
by Antonio Rius 19 Feb '25

19 Feb '25

For testing purposes I am using SLURM installed in a lenovo laptop running ubuntu 24-04. I am trying this configuration as part of my slurm learning process before using a real HPC facility. In my trial I want to submit an array of 9 independent processes. The CPU information of such machine is: Total Logical CPUs: 16 Physical Cores: 12 Threads per Core: 2 ================================= and the slurm script (attached) that I am using contain the sentences : #SBATCH --job-name=process_pkl #SBATCH --array=1-9 #SBATCH --output=/tmp/process_pkl_%A_%a.out #SBATCH --error=/tmp/process_pkl_%A_%a.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=2G =================================== but in practice, using htop, only 4 processes are running concurrently, and there are other 12 CPUs without activity. My question is: What am I missing in my setup?

4 3

slurmrestd health check
by Hagdorn, Magnus Karl Moritz 19 Feb '25

19 Feb '25

Hi there, we use haproxy to distribute SLURM REST API requests to multiple instances of slurmrestd. For the haproxy we need a health check. At the moment we are just checking that we get a 401 status. This works but we are ending up with a lot of noise in the log files. It would be very nice if there was an unauthenticated REST API endpoint that can be used to check the daemon is up. Has anybody solved this issue? Refards magnus -- Dr. Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing

1 0

/etc/passwd sync?
by mark.w.moorcroft＠ama-inc.com 19 Feb '25

19 Feb '25

If you set up slurm elastic cloud in EC2 without LDAP, what is the recommended method for sync of the passwd/group files? Is this necessary to get openmpi jobs to run. I would swear I had this working last week without synced passwd on two nodes. But thinking about it now I'm not sure how this could have worked. My home directories are in an NFS mount, but the user accounts don't exist on the node AMI. I'm using ansible/packer to manage the AMI's. When I ran OpenHPC / Slurm on bare metal there was a sync process. This is my first AWS Slurm cluster rodeo. I can't use the Amazon Parallel Computing tools because we are forced to be in GovCloud. I started with "ClusterInTheCloud", but it's all 4 years old, and semi-broken out of the box. My manager had me ditch a lot of it (including LDAP). So I'm building out a fork that is getting heavily modded for our situation. An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity

4 4

how to set slurmdbd.conf if using two slurmdb node with HA database?
by taleintervenor＠sjtu.edu.cn 19 Feb '25

19 Feb '25

The deployment scenario is as follows: nodeA nodeB (slurmctld) (backup slurmctld) | \-------------------------------/ | | / \ | nodeC nodeD (slurmdbd) (backup slurmdbd) (mysql) <--multi master replica--> (mysql) Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node. In such case, how should we set the slurmdbd.conf? The conf file contains options "DbdAddr", "DbdHost" and "DbdBackupHost". Should they be consistent between nodeA-2 and nodeB-2? Such as: DbdAddr = nodeC | DbdAddr = nodeC DbdHost = nodeC | DbdHost = nodeC DbdBackupHost = nodeD | DbdBackupHost = nodeD StorageHost = nodeC | StorageHost = nodeD Or maybe just set different conf and don't use the "DbdBackupHost" like: DbdAddr = nodeC | DbdAddr = nodeD DbdHost = nodeC | DbdHost = nodeD StorageHost = nodeC | StorageHost = nodeD I'm quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one? Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains "AccountingStorageHost" and "AccountingStorageBackupHost", why we need set backupdbd again on slurmdbd side?

2 1

Run only one time on a node
by John Hearns 19 Feb '25

19 Feb '25

I am running single node tests on a cluster. I can select named nodes using the -2 flag with sbatch. However - if I want to submit perhaps 20 test jobs is there any smart way to run only one time on a node? I know I could touch a file with the hostname and test for that file. I am just wondering if there is a smarter way to do this.\ I should point out that the tests take a few minutes to run, so if a node finishes a test run it could become idle and run the tests again.

3 2

Create filenames based on slurm hosts
by John Hearns 15 Feb '25

15 Feb '25

I am working on power logging of a GPU cluster I am working with. I am running jobs on multiple hosts. I wanst to create a file , one for each host, which has a unique filename containing the host name. Something like clush -w $SLURM_JOB_NODELIST "touch file$(hostname)" My foo is weak today. Help me Ole Wan Neilsen or any $Jedi

3 3

avoid using same GPU by the interactive job
by navin srivastava 13 Feb '25

13 Feb '25

hi, facing an issue in my environment where the batch job and the interactive job use the same gpu. Each server has 2 gpu. When 2 batch jobs are running it works fine and use the 2 different gpu's. but if one batch job is running and another job is submitted interactively then it uses the same GPU . Is there a way to avoid this? GresTypes=gpu NodeName=node[01-02] NodeAddr=node[01-02] CPUs=48 Boards=1 SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=1 TmpDisk=6000000 RealMemory=515634 Feature=A100 Gres=gpu:2 PartitionName=onprem Nodes=node[01-10] Default=YES MaxTime=21-00:00:00 DefaultTime=3-00:00:00 State=UP Shared=YES OverSubscribe=NO gres.conf: Name=gpu File=/dev/nvidia0 Name=gpu File=/dev/nvidia1 Any suggestions on this. Regards Navin

3 3

error no error
by Ricardo Román-Brenes 13 Feb '25

13 Feb '25

Hello. Could someone enlighten me as to what this error message is? Feb 13 10:02:00 gpu1 slurmd[573705]: slurmd: error: slurm_msg_sendto: address:port=192.168.9.1:36698 msg_type=8001: No error

1 0

Re: /etc/passwd sync?
by Mark W. Moorcroft 13 Feb '25

13 Feb '25

I made some progress without need for the /etc/passwd sync. Sbatch is working fine on multi-node jobs it appears. Now only salloc runs fail, which I guess is expected behavior without user account sync and ssh key setup. On bare metal OpenHPC, warewulf handles all that for me. So presumably I have MPI keys, but not user account sync and keys. Again, on OpenHPC they populated keys on account creation, and passwd was synced to the boot image. Can anyone tell me what this is about? slurmctld[46449]: slurmctld: error: failed to generate json for resume job/node list Mark Moorcroft Senior Linux Administrator Analytical Mechanics Associates

1 0

2025

2024