- slurm-users - lists.schedmd.com

Redirect jobs submitted to old partition to new
by wdennis＠nec-labs.com 17 Apr '24

17 Apr '24

Hi all, I have a single-partition Slurm cluster (the single partition name being "default_queue") that I now want to implement multiple different queues on to subdivide the resources. Say the new default queue is "queue1"; Should I set the "default_queue" to `State=INACTIVE` and then use `Alternate=queue1` on it to redirect jobs sent to the old "default_queue" to end up on "queue1"? Thinking it would be nice to have an AltPartitionName= construct to handle this... (must be a reason this doesn't exist)

2 1

Slurm version 23.11.6 is now available
by Marshall Garey 16 Apr '24

16 Apr '24

We are pleased to announce the availability of Slurm version 23.11.6. The 23.11.6 release includes two different problems with the priority/multifactor plugin: a crash and a miscalculation of AssocGrpCPURunMinutes after a slurmctld reconfiguration/restart. The wsrep_on errors that sites running MySQL or older MariaDB should happen much less frequently and has a clarifying statement when it is an innocuous error. Slurm can be downloaded from https://www.schedmd.com/downloads.php . -Marshall > * Changes in Slurm 23.11.6 > ========================== > -- Avoid limiting sockets per node to one when using gres enforce-binding. > -- slurmrestd - Avoid permission denied errors when attempting to listen on > the same port multiple times. > -- Fix GRES reservations where the GRES has no topology > (no cores= in gres.conf). > -- Ensure that thread_id_rpc is gone before priority_g_fini(). > -- Fix scontrol reboot timeout removing drain state from nodes. > -- squeue - Print header on empty reponse to `--only-job-state`. > -- Fix slurmrestd not ending job properly when xauth is not present and a x11 > job is sent. > -- Add experimental job state caching with > SchedulerParameters=enable_job_state_cache to speed up querying job states > with squeue --only-job-state. > -- slurmrestd - Correct dumping of invalid ArrayJobIds returned from > 'GET /slurm/v0.0.40/jobs/state'. > -- squeue - Correct dumping of invalid ArrayJobIds returned from > `squeue --only-job-state --{json|yaml}`. > -- If scancel --ctld is not used with --interactive, --sibling, or specific > step ids, then this option issues a single request to the slurmctld to > signal all jobs matching the specified filters. This greatly improves > the performance of slurmctld and scancel. The updated --ctld option also > fixes issues with the --partition or --reservation scancel options for jobs > that requested multiple partitions or reservations. > -- slurmrestd - Give EINVAL error when failing to parse signal name to numeric > signal. > -- slurmrestd - Allow ContentBody for all methods per RFC7230 even if ignored. > -- slurmrestd - Add 'DELETE /slurm/v0.0.40/jobs' endpoint to allow bulk job > signaling via slurmctld. > -- Fix combination of --nodelist and --exclude not always respecting the > excluded node list. > -- Fix jobs incorrectly allocating nodes exclusively when started on a > partition that doesn't enforce it. This could happen if a multi-partition > job doesn't specify --exclusive and is evaluated first on a partition > configured with OverSubscribe=EXCLUSIVE but ends up starting in a partition > configured with OverSubscribe!=EXCLUSIVE evaluated afterwards. > -- Setting GLOB_SILENCE flag no longer exposes old bugged behavior. > -- Fix associations AssocGrpCPURunMinutes being incorrectly computed for > running jobs after a controller reconfiguration/restart. > -- Fix scheduling jobs that request --gpus and nodes have different node > weights and different numbers of gpus. > -- slurmrestd - Add "NO_CRON_JOBS" as possible flag value to the following: > 'DELETE /slurm/v0.0.40/jobs' flags field. > 'DELETE /slurm/v0.0.40/job/{job_id}?flags=' flags query parameter. > -- Fix scontrol segfault/assert failure if the TRESPerNode parameter is used > when creating reservations. > -- Avoid checking for wsrep_on when restoring streaming replication settings. > -- Clarify in the logs that error "1193 Unknown system variable 'wsrep_on'" is > innocuous. > -- accounting_storage/mysql - Fix problem when loading reservations from an > archive dump. > -- slurmdbd - Fix minor race condition when sending updates to a shutdown > slurmctld. > -- slurmctld - Fix invalid refusal of a reservation update. > -- openapi - Fix memory leak of /meta/slurm/cluster response field. > -- Fix memory leak when using auth/slurm and AuthInfo=use_client_ids. -- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support

1 0

Fwd: sreport cluster UserUtilizationByaccount Used result versus sreport job SizesByAccount or sacct: inconsistencies
by KK 16 Apr '24

16 Apr '24

---------- Forwarded message --------- 发件人： KK <daijiangkuicgo(a)gmail.com> Date: 2024年4月15日周一 13:25 Subject: sreport cluster UserUtilizationByaccount Used result versus sreport job SizesByAccount or sacct: inconsistencies To: <slurm-users(a)schedmd.com> I wish to ascertain the CPU core hours utilized by user dj1 and dj. I have tested with sreport cluster UserUtilizationByAccount, sreport job SizesByAccount, and sacct. It appears that sreport cluster UserUtilizationByAccount displays the total core hours used by the entire account, rather than the individual user's cpu time. Here are the specifics: Users dj and dj1 are both under the account mehpc. In 2024-04-12 ~ 2024-04-15, dj1 used approximately 10 minutes of core time, while dj used about 4 minutes. However, "*sreport Cluster UserUtilizationByAccount user=dj1 start=2024-04-12 end=2024-04-15*" shows 14 minutes of usage. Similarly, "*sreport job SizesByAccount Users=dj start=2024-04-12 end=2024-04-15*" hows about 14 minutes. Using "*sreport job SizesByAccount Users=dj1 start=2024-04-12 end=2024-04-15*" or "*sacct -u dj1 -S 2024-04-12 -E 2024-04-15 -o "jobid,partition,account,user,alloccpus,cputimeraw,state,workdir%60" -X |awk 'BEGIN{total=0}{total+=$6}END{print total}'*" yields the accurate values, which are around 10 minutes for dj1. Here are the details: [root@ood-master ~]# sacctmgr list assoc format=cluster,user,account,qos Cluster User Account QOS ---------- ---------- ---------- -------------------- mehpc root normal mehpc root root normal mehpc mehpc normal mehpc dj mehpc normal mehpc dj1 mehpc normal [root@ood-master ~]# sacct -X -u dj1 -S 2024-04-12 -E 2024-04-15 -o jobid,ncpus,elapsedraw,cputimeraw JobID NCPUS ElapsedRaw CPUTimeRAW ------------ ---------- ---------- ---------- 4 1 60 60 5 2 120 240 6 1 61 61 8 2 120 240 9 0 0 0 [root@ood-master ~]# sacct -X -u dj -S 2024-04-12 -E 2024-04-15 -o jobid,ncpus,elapsedraw,cputimeraw JobID NCPUS ElapsedRaw CPUTimeRAW ------------ ---------- ---------- ---------- 7 2 120 240 [root@ood-master ~]# sreport job SizesByAccount Users=dj1 start=2024-04-12 end=2024-04-15 -------------------------------------------------------------------------------- Job Sizes 2024-04-12T00:00:00 - 2024-04-14T23:59:59 (259200 secs) Time reported in Minutes -------------------------------------------------------------------------------- Cluster Account 0-49 CPUs 50-249 CPUs 250-499 CPUs 500-999 CPUs >= 1000 CPUs % of cluster --------- --------- ------------- ------------- ------------- ------------- ------------- ------------ mehpc root 10 0 0 0 0 100.00% [root@ood-master ~]# sreport job SizesByAccount Users=dj start=2024-04-12 end=2024-04-15 -------------------------------------------------------------------------------- Job Sizes 2024-04-12T00:00:00 - 2024-04-14T23:59:59 (259200 secs) Time reported in Minutes -------------------------------------------------------------------------------- Cluster Account 0-49 CPUs 50-249 CPUs 250-499 CPUs 500-999 CPUs >= 1000 CPUs % of cluster --------- --------- ------------- ------------- ------------- ------------- ------------- ------------ mehpc root 4 0 0 0 0 100.00% [root@ood-master ~]# sreport Cluster UserUtilizationByAccount user=dj1 start=2024-04-12 end=2024-04-15 -------------------------------------------------------------------------------- Cluster/User/Account Utilization 2024-04-12T00:00:00 - 2024-04-14T23:59:59 (259200 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- mehpc dj1 dj1 dj1 mehpc 14 0 [root@ood-master ~]# sreport Cluster UserUtilizationByAccount user=dj start=2024-04-12 end=2024-04-15 -------------------------------------------------------------------------------- Cluster/User/Account Utilization 2024-04-12T00:00:00 - 2024-04-14T23:59:59 (259200 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy --------- --------- --------------- --------------- -------- -------- mehpc dj dj dj mehpc 14 0 [root@ood-master ~]# sacct -u dj1 -S 2024-04-12 -E 2024-04-15 -o "jobid,partition,account,user,alloccpus,cputimeraw,state,workdir%60" -X |awk 'BEGIN{total=0}{total+=$6}END{print total}' 601 [root@ood-master ~]# sacct -u dj -S 2024-04-12 -E 2024-04-15 -o "jobid,partition,account,user,alloccpus,cputimeraw,state,workdir%60" -X |awk 'BEGIN{total=0}{total+=$6}END{print total}' 240 [root@ood-master ~]# sreport cluster AccountUtilizationByUser accounts=mehpc start=2024-01-01 end=2024-04-15 -------------------------------------------------------------------------------- Cluster/Account/User Utilization 2024-01-01T00:00:00 - 2024-04-14T23:59:59 (9072000 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Account Login Proper Name Used Energy --------- --------------- --------- --------------- -------- -------- mehpc mehpc 14 0 mehpc mehpc dj dj dj 14 0 mehpc mehpc dj1 dj1 dj1 14 0

1 0

Slurm.conf and workers
by Xaver Stiensmeier 15 Apr '24

15 Apr '24

Dear slurm-user list, as far as I understood it, the slurm.conf needs to be present on the master and on the workers at slurm.conf (if no other path is set via SLURM_CONF). However, I noticed that when adding a partition only in the master's slurm.conf, all workers were able to "correctly" show the added partition when calling sinfo on them. Is the stored slurm.conf on every instance just a fallback for when connection is down or what is the purpose? The documentation only says: "This file should be consistent across all nodes in the cluster." Best regards, Xaver

2 1

Interfaces of topology/tree and Topology Awareness
by nico.derl＠tutanota.com 15 Apr '24

15 Apr '24

Hello everyone, I'm trying to improve topology awareness in a local Slurm-managed HPC system. It's using the default hierarchical 3-level topology with the tree-plugin. It however does not always confine jobs to the most tightly packed group of nodes, seems to over-provision switches for smaller jobs, and gets slow or overwhelmed with jobs that have a high node count. I'd like to implement something more literally aligned with best-fit, but I'm having trouble understanding the relevant interfaces to hook into the topology model of Slurm. I would like a high-level explanation of how the tree- and common topology components work, how they integrate into the higher scheduling logic and what the internal topology model looks like. Or some pointers to relevant docs discussing this. I have read the topology guide and its dev-doc, which does note some of the caveats I mentioned. It however only talks about providing a set of weights to the upper logic levels in the form of a node ranking. I can't see how this ranking resembles the topology and how it's being used. From looking at the signatures and C-code I can tell this much: topology-tree consumes the topology.conf and generates a ranking of some kind that is passed to topology-common. topology-common consumes a ranking and uses its own gres-sched to figure out what nodes can fit a job (possibly pulling info from the gres-select-plugin to determine node capabilities). It's then supposed to apply a best-fit algorithm to efficiently fill up vacant cluster-capacity, but I can't manage to follow this part in the code as everything crumbles into separate files that I can't link correctly in my head. Thanks in advance. referenced docs: <https://slurm.schedmd.com/topology.html> <https://hpc.rz.rptu.de/documentation/topology_plugin.html> <https://github.com/SchedMD/slurm/tree/master/src/plugins/topology/common>

2 1

Re: slurmrestd connect to 192.168.87.113:6819 Connection refused
by shaobo liu 15 Apr '24

15 Apr '24

Thanks, The reason was found. It was caused by the expiration of the rest api token. <nico.derl(a)tutanota.com> 于2024年4月12日周五 22:56写道： > If you say DBd isn't using 6819, in the sense that you selected a > different port, make sure the dbdport directive reflects that in both > slurmdbd.conf and AccountingStoragePort in slurm.conf. > It must be getting the 6819 from somewhere. > > > 12. Apr. 2024, 16:05 von dspam.liu(a)gmail.com: > > slurmctrld and rest are on the same machine, No firewall. secondary > slurmdbd is background mode, slurmdbd does not listen on port 6819. > > OS: ubuntu 20.04 > SLURM: 23.11.0 > > <nico.derl(a)tutanota.com> 于2024年4月12日周五 20:18写道： > > Hey, > Are slurmctrld and restd on separate machines? Can you manually reach > them? Could there be a firewall/closed port in the way? > > > 12. Apr. 2024, 11:36 von slurm-users(a)lists.schedmd.com: > > hi，slurm configured primary and secondary，The error when requesting > slurmrest api is as follows, may I ask what is the reason? > > # scontrol ping > Slurmctld(primary) at node003 is UP > Slurmctld(backup) at node113 is UP > > > # systemctl status slurmrestd.service > ● slurmrestd.service - Slurm REST daemon > Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; > vendor preset: enabled) > Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago > Main PID: 705425 (slurmrestd) > Tasks: 21 (limit: 629145) > Memory: 20.3M > CGroup: /system.slice/slurmrestd.service > └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf > unix:/var/spool/slurm/slurmrestd.socket 0.0.0.0:6820 -vvv > > Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed > to connect to 192.168.87.113:6819: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm > stream socket at 192.168.87.113:6819: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: > slurm_persist_conn_open_without_init: failed to open persistent connection > to host:node113:6819: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending > PersistInit msg: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: > slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection > refused > Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: > init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to > database -> openapi_get_db_conn() failed to open slurmdb connecti> > Apr 12 17:08:46 node003 slurmrestd[705425]: error: > slurm_persist_conn_open_without_init: failed to open persistent connection > to host:node113:6819: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit > msg: Connection refused > Apr 12 17:08:46 node003 slurmrestd[705425]: error: > slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection > refused > Apr 12 17:08:46 node003 slurmrestd[705425]: error: > init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to > database -> openapi_get_db_conn() failed to open slurmdb connection > > > >

1 0

slurmrestd connect to 192.168.87.113:6819 Connection refused
by shaobo liu 12 Apr '24

12 Apr '24

hi，slurm configured primary and secondary，The error when requesting slurmrest api is as follows, may I ask what is the reason? # scontrol ping Slurmctld(primary) at node003 is UP Slurmctld(backup) at node113 is UP # systemctl status slurmrestd.service ● slurmrestd.service - Slurm REST daemon Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago Main PID: 705425 (slurmrestd) Tasks: 21 (limit: 629145) Memory: 20.3M CGroup: /system.slice/slurmrestd.service └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf unix:/var/spool/slurm/slurmrestd.socket 0.0.0.0:6820 -vvv Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed to connect to 192.168.87.113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm stream socket at 192.168.87.113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connecti> Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connection

2 1

visualisation of JobComp and JobacctGather data with Grafana - screenshots, ideas?
by Josef Dvoracek 12 Apr '24

12 Apr '24

Is here anybody having nice visualization of JobComp and JobacctGather data in Grafana? I save JobComp data in Elasticsearch, JobacctGather data in influxDB, and thinking about how to provide meaningful insights to $users. Things I'd like to show..: especially memory & cpu utilization, job result, possible malicious effects like OOMs... Any screenshots, ideas, experience welcomed! cheers Josef

2 1

Re: Slurmd enabled crash with CgroupV2
by Josef Dvoracek 11 Apr '24

11 Apr '24

I observe same behavior on slurm 23.11.5 Rocky Linux8.9.. > [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control > memory pids > [root@compute ~]# systemctl disable slurmd > Removed /etc/systemd/system/multi-user.target.wants/slurmd.service. > [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control > cpuset cpu io memory pids > [root@compute ~]# systemctl enable slurmd > Created symlink /etc/systemd/system/multi-user.target.wants/slurmd.service → /usr/lib/systemd/system/slurmd.service. > [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control > cpuset cpu io memory pids over time (i see this thread is ~1 year old, is here better / new understanding of this? cheers josef On 23. 05. 23 12:46, Alan Orth wrote: > I notice the exact same behavior as Tristan. My CentOS Stream 8 system > is in full unified cgroupv2 mode, the slurmd.service has a > "Delegate=Yes" override added to it, and all cgroup stuff is added to > slurm.conf and cgroup.conf, yet slurmd does not start after reboot. I > don't understand what is happening, but I see the exact same behavior > regarding the cgroup subtree_control with disabling / re-enabling slurmd. >

2 3

Jobs of a user are stuck in Completing stage for a long time and cannot cancel them
by archisman.pathak＠nksecurities.com 11 Apr '24

11 Apr '24

We are running a slurm cluster with version `slurm 22.05.8`. One of our users has reported that their jobs have been stuck at the completion stage for a long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide we found that indeed the batchhost for the job was removed from the cluster, perhaps without draining it first. How do we cancel/delete the jobs ? * We tried scancel on the batch and individual job ids from both the user and from SlurmUser

4 5

2025

2024

slurm-users ----- 2025 ----- July 2025 June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024

slurm-users