- slurm-users - lists.schedmd.com

Slurm-wlm depends on X11?
by Jesse Hayward 01 Jul '25

01 Jul '25

Hello group, hopefully a quick one: I am trying to install the suite of standard slurm packages (slurm, munge, slurmdbd, slurm-wlm-basic-plugins) on a compute node. Ubuntu 22.04.05 LTS just upgraded. *However, these packages now bring down x11 (libx11-6 and all of that other garbage in Ubuntu land). * Does slurm actually depend on x11 now? Or do I need to take a look at my apt config and see what's going on over there, instead. Thanks! Jess -- Jesse Hayward Systems Administrator for High Performance Computing Vassar College 845.437.7521 CIS 207

3 3

sbatch strange behavior with different --nodelist (-w) options
by Xinghong He 01 Jul '25

01 Jul '25

Dear Community, I'm seeing strange behavior from sbatch with different --nodelist (-w) options on my two node cluster. Here are my test scripts: *~/slurm$ cat mpirun.slm#!/bin/bash#SBATCH --job-name=mpirun_2x1#SBATCH --nodes=2#SBATCH --ntasks-per-node=1#SBATCH --exclusivesource /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpivars.shmpirun ./a.sh* *~/slurm$ cat a.sh#!/bin/bashecho "`uname -n` OMPI_COMM_WORLD_RANK = $OMPI_COMM_WORLD_RANK SLURM_NODEID = $SLURM_NODEID SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST"* If I do not specify any -w option or if I include both nodes in -w option, I get expected results. *~/slurm$ sbatch mpirun.slmSubmitted batch job 71~/slurm$ cat slurm-71.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-271 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* However, if I specify only one node in -w option, but still want two nodes, I always get one expected result and one unexpected result. The unexpected one will dispatch both MPI tasks to the same node. This one is expected - running two MPI tasks across nodes. *~/slurm$ sbatch -w std-199 mpirun.slmSubmitted batch job 72~/slurm$ cat slurm-72.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-271 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* This one if unexpected - ends up running two MPI tasks on the same node, though SLURM_JOB_NODELIST also gives the correct two nodes. *~/slurm$ sbatch -w std-271 mpirun.slmSubmitted batch job 73ubuntu@bright-anchovy-controller:~/slurm$ cat slurm-73.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-199 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* I first saw the problem on a larger cluster where I needed to specify both -w and -x options to include and exclude nodes. I then narrowed it down to a two node cluster. I tried adding options like --hostfile or --rankfile, or -npernode, all does not change how the tasks are dispatched to nodes. The problem is repeatable. Here are the tested systems: Slurm 23.02.5 on Ubuntu 22.04.5 LTS Slurm 24.05.1 on Ubuntu 22.04.4 LTS How to make the last one work. I.e., requesting *-w std-271* and make it run on two nodes. I'd appreciate any help! Regards, Xinghong

1 0

extern step does not exit
by william＠signalbox.org.uk 27 Jun '25

27 Jun '25

I am building a new small cluster on Rocky Linux 9 with Slurm 24.11.5. Slurm was compiled with the default options except that we added --with pmix We do use some additional complications - we use NFSv4 with AD authentication (so we use auks), and we use SElinux in enforcing mode. However, I do not currently see any evidence that those relate to the problem that we have; though note that the slurm account is an AD account. All relevant AD accounts have the RFC 2307 attributes (uid, gid etc.) configured. I had also configured pam_slurm_adopt but as that was causing issues I backed out the changes that enable it. It was too hard to troubleshoot without being able to login to the test compute node easily. The issue I am having in testing is that whatever job I launch, when it terminates the extern job step does not terminate. These are the more relevant slurm.conf settings: Epilog=/etc/slurm/epilog.sh JobAcctGatherType=jobacct_gather/cgroup JobCompType=jobcomp/none KillOnBadExit=1 KillWait=30 ProctrackType=proctrack/cgroup Prolog=/etc/slurm/prolog.sh PrologFlags=Alloc,Contain SelectTypeParameters=CR_Core_Memory SelectType=select/cons_tres SlurmUser=slurm TaskPlugin=task/affinity,task/cgroup TaskProlog=/etc/slurm/taskprolog.sh The slurmd.log on the compute node for the job shows a sequence like this (with -vv debug enabled): [2025-06-26T14:57:23.272] select/cons_tres: init: select/cons_tres loaded [2025-06-26T14:57:23.272] select/linear: init: Linear node selection plugin loaded with argument 20 [2025-06-26T14:57:23.272] cred/munge: init: Munge credential signature plugin loaded [2025-06-26T14:57:23.272] [34.extern] debug: auth/munge: init: loaded [2025-06-26T14:57:23.273] [34.extern] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2025-06-26T14:57:23.279] [34.extern] debug: cgroup/v2: init: Cgroup v2 plugin loaded [2025-06-26T14:57:23.290] [34.extern] debug: hash/k12: init: init: KangarooTwelve hash plugin loaded [2025-06-26T14:57:23.291] [34.extern] debug: CPUs:40 Boards:1 Sockets:2 CoresPerSocket:20 ThreadsPerCore:1 [2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: core enforcement enabled [2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: task_cgroup_memory_init: task/cgroup/memory: TotCfgRealMem:191587M allowed:100%(enforced), swap:0%(enforced), max:100%(191587M) max+swap:100%(383174M) min:30M [2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: memory enforcement enabled [2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: device enforcement enabled [2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: Tasks containment cgroup plugin loaded [2025-06-26T14:57:23.291] [34.extern] task/affinity: init: task affinity plugin loaded with CPU mask 0xffffffffff [2025-06-26T14:57:23.291] [34.extern] debug: jobacct_gather/cgroup: init: Job accounting gather cgroup plugin loaded [2025-06-26T14:57:23.291] [34.extern] topology/default: init: topology Default plugin loaded [2025-06-26T14:57:23.292] [34.extern] debug: gpu/generic: init: init: GPU Generic plugin loaded [2025-06-26T14:57:23.293] [34.extern] debug: Setting slurmstepd(2666071) oom_score_adj to -1000 [2025-06-26T14:57:23.293] [34.extern] debug: Message thread started pid = 2666071 [2025-06-26T14:57:23.294] [34.extern] debug: spank: opening plugin stack /etc/slurm/plugstack.conf [2025-06-26T14:57:23.294] [34.extern] debug: /etc/slurm/plugstack.conf: 1: include "/etc/slurm/plugstack.conf.d/*.conf" [2025-06-26T14:57:23.294] [34.extern] debug: spank: opening plugin stack /etc/slurm/plugstack.conf.d/auks.conf [2025-06-26T14:57:23.298] [34.extern] debug: spank: /etc/slurm/plugstack.conf.d/auks.conf:57: Loaded plugin auks.so [2025-06-26T14:57:23.298] [34.extern] debug: SPANK: appending plugin option "auks" [2025-06-26T14:57:23.352] [34.extern] spank-auks: new unique ccache is KCM:71433:49222 <<<< This UID 71433 is the user running the job [2025-06-26T14:57:23.364] [34.extern] spank-auks: user '71433' cred stored in ccache KCM:71433:49222 [2025-06-26T14:57:23.380] [34.extern] debug: task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0' [2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup: task_cgroup_cpuset_create: step abstract cores are '0' [2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup: task_cgroup_cpuset_create: job physical CPUs are '0' [2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup: task_cgroup_cpuset_create: step physical CPUs are '0' [2025-06-26T14:57:23.381] [34.extern] task/cgroup: _memcg_initialize: job: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB job_swappiness=18446744073709551614 [2025-06-26T14:57:23.381] [34.extern] task/cgroup: _memcg_initialize: step: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB job_swappiness=18446744073709551614 [2025-06-26T14:57:23.385] [34.extern] debug: close_slurmd_conn: sending 0: No error [2025-06-26T14:57:24.371] launch task StepId=34.interactive request from UID:71433 GID:70668 HOST:172.17.11.22 PORT:46948 [2025-06-26T14:57:24.372] task/affinity: lllp_distribution: JobId=34 manual binding: mask_cpu,one_thread [2025-06-26T14:57:24.372] debug: Waiting for job 34's prolog to complete [2025-06-26T14:57:24.372] debug: Finished wait for job 34's prolog to complete [2025-06-26T14:57:24.377] select/cons_tres: init: select/cons_tres loaded [2025-06-26T14:57:24.377] select/linear: init: Linear node selection plugin loaded with argument 20 [2025-06-26T14:57:24.377] cred/munge: init: Munge credential signature plugin loaded [2025-06-26T14:57:24.377] [34.interactive] debug: auth/munge: init: loaded [2025-06-26T14:57:24.379] [34.interactive] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2025-06-26T14:57:24.384] [34.interactive] debug: cgroup/v2: init: Cgroup v2 plugin loaded [2025-06-26T14:57:24.403] [34.interactive] debug: hash/k12: init: init: KangarooTwelve hash plugin loaded [2025-06-26T14:57:24.404] [34.interactive] debug: CPUs:40 Boards:1 Sockets:2 CoresPerSocket:20 ThreadsPerCore:1 [2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: core enforcement enabled [2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: task_cgroup_memory_init: task/cgroup/memory: TotCfgRealMem:191587M allowed:100%(enforced), swap:0%(enforced), max:100%(191587M) max+swap:100%(383174M) min:30M [2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: memory enforcement enabled [2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: device enforcement enabled [2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: Tasks containment cgroup plugin loaded [2025-06-26T14:57:24.404] [34.interactive] task/affinity: init: task affinity plugin loaded with CPU mask 0xffffffffff [2025-06-26T14:57:24.404] [34.interactive] debug: jobacct_gather/cgroup: init: Job accounting gather cgroup plugin loaded [2025-06-26T14:57:24.404] [34.interactive] topology/default: init: topology Default plugin loaded [2025-06-26T14:57:24.404] [34.interactive] debug: gpu/generic: init: init: GPU Generic plugin loaded [2025-06-26T14:57:24.406] [34.interactive] debug: close_slurmd_conn: sending 0: No error [2025-06-26T14:57:24.406] [34.interactive] debug: Message thread started pid = 2666089 [2025-06-26T14:57:24.406] [34.interactive] debug: Setting slurmstepd(2666089) oom_score_adj to -1000 [2025-06-26T14:57:24.407] [34.interactive] debug: spank: opening plugin stack /etc/slurm/plugstack.conf [2025-06-26T14:57:24.407] [34.interactive] debug: /etc/slurm/plugstack.conf: 1: include "/etc/slurm/plugstack.conf.d/*.conf" [2025-06-26T14:57:24.407] [34.interactive] debug: spank: opening plugin stack /etc/slurm/plugstack.conf.d/auks.conf [2025-06-26T14:57:24.411] [34.interactive] debug: spank: /etc/slurm/plugstack.conf.d/auks.conf:57: Loaded plugin auks.so [2025-06-26T14:57:24.411] [34.interactive] debug: SPANK: appending plugin option "auks" [2025-06-26T14:57:24.461] [34.interactive] spank-auks: new unique ccache is KCM:71433:44156 [2025-06-26T14:57:24.467] [34.interactive] spank-auks: user '71433' cred stored in ccache KCM:71433:44156 [2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0' [2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup: task_cgroup_cpuset_create: step abstract cores are '0' [2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup: task_cgroup_cpuset_create: job physical CPUs are '0' [2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup: task_cgroup_cpuset_create: step physical CPUs are '0' [2025-06-26T14:57:24.481] [34.interactive] task/cgroup: _memcg_initialize: job: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB job_swappiness=18446744073709551614 [2025-06-26T14:57:24.481] [34.interactive] task/cgroup: _memcg_initialize: step: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB job_swappiness=18446744073709551614 [2025-06-26T14:57:24.501] [34.interactive] warning: restricted to a subset of cpus [2025-06-26T14:57:24.503] [34.interactive] debug: stdin uses a pty object [2025-06-26T14:57:24.504] [34.interactive] debug: init pty size 48:324 [2025-06-26T14:57:24.506] [34.interactive] debug levels are stderr='error', logfile='debug', syslog='quiet' [2025-06-26T14:57:24.507] [34.interactive] debug: IO handler started pid=2666089 [2025-06-26T14:57:24.517] [34.interactive] spank-auks: credential renewer launched (pid=2666107) [2025-06-26T14:57:24.517] [34.interactive] starting 1 tasks [2025-06-26T14:57:24.517] [34.interactive] task 0 (2666108) started 2025-06-26T14:57:24 [2025-06-26T14:57:24.537] [34.interactive] debug: task/affinity: task_p_pre_launch: affinity StepId=34.interactive, task:0 bind:mask_cpu,one_thread [2025-06-26T14:57:24.537] [34.interactive] debug: [job 34] attempting to run slurm task_prolog [/etc/slurm/taskprolog.sh] [2025-06-26T14:57:24.537] [34.interactive] debug: Sending launch resp rc=0 [2025-06-26T14:57:24.563] [34.interactive] debug: export name:TMPDIR:val:/local/scratch/34: [2025-06-26T14:59:42.726] [34.interactive] task 0 (2666108) exited with exit code 0. <<<< Here the interactive job was closed [2025-06-26T14:59:42.727] [34.interactive] spank-auks: all tasks exited, killing credential renewer (pid=2666107) [2025-06-26T14:59:42.729] [34.interactive] debug: task/affinity: task_p_post_term: affinity StepId=34.interactive, task 0 [2025-06-26T14:59:42.729] [34.interactive] debug: signaling condition [2025-06-26T14:59:42.729] [34.interactive] debug: jobacct_gather/cgroup: fini: Job accounting gather cgroup plugin unloaded [2025-06-26T14:59:42.729] [34.interactive] debug: Waiting for IO [2025-06-26T14:59:42.729] [34.interactive] debug: Closing debug channel [2025-06-26T14:59:42.729] [34.interactive] debug: IO handler exited, rc=0 [2025-06-26T14:59:42.729] [34.interactive] debug: task/cgroup: fini: Tasks containment cgroup plugin unloaded [2025-06-26T14:59:42.731] [34.interactive] debug: slurm_recv_timeout at 0 of 4, recv zero bytes [2025-06-26T14:59:42.738] [34.interactive] spank-auks: Destroyed ccache KCM:71433:44156 [2025-06-26T14:59:42.740] debug: _rpc_terminate_job: uid = 71755 JobId=34 <<<< That uid 71755 is for the slurm account which does not run any daemons on a compute node [2025-06-26T14:59:42.740] debug: credential for job 34 revoked [2025-06-26T14:59:42.742] [34.extern] debug: Handling REQUEST_SIGNAL_CONTAINER [2025-06-26T14:59:42.742] [34.extern] debug: _handle_signal_container for StepId=34.extern uid=71755 signal=18 flag=0x0 [2025-06-26T14:59:42.742] [34.extern] Sent signal 18 to StepId=34.extern [2025-06-26T14:59:42.742] [34.interactive] debug: Handling REQUEST_SIGNAL_CONTAINER [2025-06-26T14:59:42.742] [34.interactive] debug: _handle_signal_container for StepId=34.interactive uid=71755 signal=18 flag=0x0 [2025-06-26T14:59:42.742] [34.interactive] Sent signal 18 to StepId=34.interactive [2025-06-26T14:59:42.743] [34.extern] debug: Handling REQUEST_SIGNAL_CONTAINER [2025-06-26T14:59:42.743] [34.extern] debug: _handle_signal_container for StepId=34.extern uid=71755 signal=15 flag=0x0 [2025-06-26T14:59:42.743] [34.extern] Sent signal 15 to StepId=34.extern [2025-06-26T14:59:42.743] [34.interactive] debug: Handling REQUEST_SIGNAL_CONTAINER [2025-06-26T14:59:42.743] [34.interactive] debug: _handle_signal_container for StepId=34.interactive uid=71755 signal=15 flag=0x0 [2025-06-26T14:59:42.743] [34.interactive] Sent signal 15 to StepId=34.interactive [2025-06-26T14:59:42.744] [34.interactive] debug: Handling REQUEST_STATE [2025-06-26T14:59:42.744] [34.interactive] debug: Message thread exited [2025-06-26T14:59:42.744] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:42.763] [34.interactive] done with step [2025-06-26T14:59:42.766] [34.extern] debug: signaling condition [2025-06-26T14:59:42.766] [34.extern] debug: task/affinity: task_p_post_term: affinity StepId=34.extern, task 0 [2025-06-26T14:59:42.766] [34.extern] debug: jobacct_gather/cgroup: fini: Job accounting gather cgroup plugin unloaded [2025-06-26T14:59:42.766] [34.extern] debug: task/cgroup: fini: Tasks containment cgroup plugin unloaded [2025-06-26T14:59:42.766] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:42.766] [34.extern] debug: Terminate signal (SIGTERM) received [2025-06-26T14:59:42.768] [34.extern] error: setgroups: Operation not permitted [2025-06-26T14:59:42.768] [34.extern] error: _shutdown_x11_forward: Unable to drop privileges [2025-06-26T14:59:42.771] [34.extern] spank-auks: Destroyed ccache KCM:71433:49222 [2025-06-26T14:59:42.817] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:42.918] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:43.419] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:44.420] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:45.420] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:46.421] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:47.422] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:48.423] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:49.424] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:50.425] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:51.426] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:52.426] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T14:59:53.427] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:03.428] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:12.429] [34.extern] debug: Handling REQUEST_STEP_TERMINATE [2025-06-26T15:00:12.429] [34.extern] debug: _handle_terminate for StepId=34.extern uid=0 [2025-06-26T15:00:12.429] [34.extern] Sent SIGKILL signal to StepId=34.extern [2025-06-26T15:00:12.429] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:12.450] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:12.501] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:12.602] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:13.102] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:14.103] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:14.104] [34.extern] debug: Handling REQUEST_STEP_TERMINATE [2025-06-26T15:00:14.104] [34.extern] debug: _handle_terminate for StepId=34.extern uid=0 [2025-06-26T15:00:14.104] [34.extern] Sent SIGKILL signal to StepId=34.extern [2025-06-26T15:00:15.105] [34.extern] debug: Handling REQUEST_STATE [2025-06-26T15:00:15.105] [34.extern] debug: Handling REQUEST_STEP_TERMINATE . . We are left with a process "slurmstepd: [34.extern]" running, and a matching socket file for job 34 in /var/spool/slurmd; the job shows state 'CG' (Completing). I can clean up by issuing a kill -9 to the slurmstepd 'extern' job on the compute node. I do not know a way to get any logging output from that job step, but it has a file descriptor open to /var/log/slurm/slurmd.log (which is what we configured) so I assume that it is writing some of the above log. In some runs I also see this just as the main task exits (but from the .0 job step, not from .extern) : [2025-06-27T21:00:47.265] [39.0] error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/slurmstepd.scope/job_39/step_0/user/cgroup.free ze' for writing: Permission denied I can see that between reboots, the directories '/sys/fs/cgroup/system.slice/slurmstepd.scope/job_NN' remain, with the /sys/fs/cgroup/system.slice/slurmstepd.scope/job_NN/step_extern folder. The permissions of the files referred to as 'permission denied' seem fine to me: # ls -lhZ /sys/fs/cgroup/system.slice/slurmstepd.scope/job_42/step_extern/user/cgroup. freeze -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jun 27 21:46 /sys/fs/cgroup/system.slice/slurmstepd.scope/job_42/step_extern/user/cgroup. freeze There are no SElinux alerts on the system. I am not sure whether the error messages about setgroups and _shutdown_x11_forward are actually the problem, or just something else being reported. The only system that I have to compare with is running Slurm 19.05 on CentOS 7 and is rather different. I would be interested to know if anyone else has had problems with extern job steps not shutting down. William

1 0

Job information if job is completed
by Gestió Servidors 24 Jun '25

24 Jun '25

Hello, Is there any way to get all information (like submit script or submit node) from a job that is completed? Something like "scontrol show jobid=XXX" when job is "running" or "pending". I need to inspect the submit script of a job but I only know job_id. Thanks.

6 5

Implementing a "soft" wall clock limit
by Davide DelVento 24 Jun '25

24 Jun '25

In the institution where I work, so far we have managed to live without mandatory wallclock limits (a policy decided well before I joined the organization), and that has been possible because the cluster was not very much utilized. Now that is changing, with more jobs being submitted and those being larger ones. As such I would like to introduce wallclock limits to allow slurm to be more efficient in scheduling jobs, including with backfill. My concern is that this user base is not used to it and therefore I want to make it easier for them, and avoid common complaints. I anticipate one of them would be "my job was cancelled even though there were enough nodes idle and no other job in line after mine" (since the cluster utilization is increasing, but not yet always full like it has been at most other places I know). So my question is: is it possible to implement "soft" wallclock limits in slurm, namely ones which would not be enforced unless necessary to run more jobs? In other words, is it possible to change the pre-emptability of a job only after some time has passed? I can think of some ways to hack this functionality myself with some cron or at jobs, and that might be easy enough to do, but I am not sure I can make it robust enough to cover all situations, so I'm looking for something either slurm-native or (if external solution) field-tested by someone else already, so that at least the worst kinks have been already ironed out. Thanks in advance for any suggestions you may provide!

7 14

enforce Qos to users
by laddaoui＠telecom-paris.fr 24 Jun '25

24 Jun '25

Hello everyone, I'm trying to use QoS to enforce resource limits on an association, but I'm having trouble with proper enforcement. I created a QoS with resource limits: ``` sacctmgr add qos qos_gpus flags=denyonlimit,overpartqos maxjobsperuser=4 maxtresperjob=gres/gpu=1 ``` Then I assigned it to an account: ``` sacctmgr modify account name=account-a set qos=qos_gpus defaultqos=qos_gpus systemctl restart slurmctld ``` Users in this account can bypass the QoS limits by explicitly specifying a different QoS when submitting jobs: ``` srun --qos=(normal|qos_gpus) ... ``` Even though I set `defaultqos=qos_gpus`, users can still choose any available QoS and bypass the intended resource limits. My question is: How can I restrict users to only using their assigned QoS and prevent them from specifying other QoS options? Is there a configuration I'm missing to enforce QoS restrictions properly? Best, --- info about my setup slurm version : tested on 23.11.4 and 23.02.7 AccountingStorageEnforce = associations,limits EnforcePartLimits = ALL

2 2

pam_slurm_adopt - ssh to compute nodes not working in slurm 24.11
by Marx, Wolfgang 23 Jun '25

23 Jun '25

Hi, We have defined in our cluster, that users can logon to a compute node from a login node,when they have actual a job running on the compute node. To get this functionallity working, we are usinge the pam_slurm_adopt. As long as we were using slurm 23.05 it all was working well. Now we have upgraded to slurm 24.11.5 and the ssh login to the compute nodes is not longer working. When a job of a user is running on a compute node the ssh of the user to this compute node is refused form the compute node. We have not changed anything in our configuration for the pam_slurm_adopt. Also there is no indication in the release notes that anything has changed regarding pam_slurm_adopt. Is this a known bug in Slurm 24.11 and has anyone facing the same problem. This is a very important feature, especially for our ANSYS users. Thanks Wolfgang Marx Wolfgang Marx, Basisdienste, Gruppe Hochleistungrechnen Technische Universität Darmstadt, Hochschulrechenzentrum Alexanderstraße 2, 64283 Darmstadt Tel.: +496151/16-71158 E-Mail: wolfgang.marx(a)tu-darmstadt.de Web: www.hrz.tu-darmstadt.de

2 1

read-only slurm user
by Hagdorn, Magnus Karl Moritz 23 Jun '25

23 Jun '25

Hi there, we use the slurm prometheus exporter to collect slurm metrics. This works pretty well. However, we have noticed that metrics for some of the restricted partitions are not collected. It occurred to me that this is because we are using an unprivileged user to run the exporter. I am trying to figure out the best way to allow an unprivileged user to collect all metrics. I could add the user to all the relevant groups. However, I am also thinking of using the new slurm exporter that uses the API in which case I need to somehow handle a token. It would be nice to have a readonly user, ie a user that cannot submit any jobs but only read the current state of the cluster. I guess setting MaxJobs and MaxSubmitJobs to 0 would do this. Any other suggestions? Regards magnus -- Dr. Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin https://www.charite.de HPC Helpdesk: sc-hpc-helpdesk(a)charite.de

1 0

Doc Clarification: Heterogeneous Steps in Heterogeneous Job
by Steffen Christgau 19 Jun '25

19 Jun '25

Hi everybody, I am (along with others) a little bit puzzled by the meaning of a statement in the documentation concerning heterogeneous job steps inside het. jobs. The docs state (https://slurm.schedmd.com/archive/slurm-24.11.5/heterogeneous_jobs.html#het…): > You also cannot request heterogeneous steps from within a heterogeneous job. (A) On a very small Slurm test installation with just two nodes, the following het job that requests het steps (does it, right?!) runs fine: $ cat hetjob-steps.sh #!/bin/bash #SBATCH --mem-per-cpu=2g --nodes=1 --cpus-per-task=8 #SBATCH hetjob #SBATCH --mem-per-cpu=1g --nodes=1 --cpus-per-task=4 srun -l --cpus-per-task=4 nproc : -l --cpus-per-task=2 nproc $ cat slurm-125.out 1: 4 2: 2 3: 2 0: 4 The output looks reasonable and it looks like the above quote does not apply since one can apparently request het steps in a het job. Or am I wrong? The intro in the respective section also gives the impression that het jobsteps are a convenience feature that does not require het jobs, but it does not explicitly exclude the usage of het steps in het jobs: > Slurm version 20.11 introduces the ability to request heterogeneous job steps from within a non-homogeneous job allocation. This allows you the flexibility to have different layouts for job steps without requiring the use of heterogeneous jobs, where having separate jobs for the components may be undesirable. So what does the initial statement (A) actually mean then? Am I just using a lucky example which is actually not supported? A short clarification would be helpful. Thanks in advance Steffen

1 0

Wrong MaxRSS Behavior with cgroup v2 in Slurm
by Guillaume COCHARD 19 Jun '25

19 Jun '25

Hello, We've noticed a recent change in how MaxRSS is reported on our cluster. Specifically, the MaxRSS value for many jobs now often matches the allocated memory, which was not the case previously. It appears this change is due to how Slurm accounts for memory when copying large files, likely as a result of moving from cgroup v1 to cgroup v2. Here’s a simple example: copy_file.sh #!/bin/bash cp /distributed/filesystem/file5G /tmp cp /tmp/file5G ~ Two jobs with different memory allocations: Job 1 sbatch -c 1 --mem=1G copy_file.sh seff <jobid> Memory Utilized: 1021.87 MB Memory Efficiency: 99.79% of 1.00 GB Job 2 sbatch -c 1 --mem=10G copy_file.sh seff <jobid> Memory Utilized: 4.02 GB Memory Efficiency: 40.21% of 10.00 GB With cgroup v1, this script typically showed minimal memory usage. Now, under cgroup v2, memory usage appears inflated and depends on the allocated memory, which seems wrong. I believe this behavior aligns with similar issues raised by the Kubernetes community [1], and is consistent with how memory.current behaves in cgroup v2 [3]. According to Slurm’s documentation about cgroup v2, "this plugin provides cgroup's memory.current value from the memory interface, which is not equal to the RSS value provided by procfs. Nevertheless it is the same value that the kernel uses in its OOM killer logic." [2] While technically correct, this seems to mark a significant change in what MaxRSS and "Memory Efficiency" actually measure and renders those metrics almost useless. Our Configuration: ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity Question: Is there a way to restore more realistic MaxRSS values — specifically, ones that exclude file-backed page cache — while still using cgroup v2? Thanks, Guillaume References: [1] https://github.com/kubernetes/kubernetes/issues/118916 [2] https://slurm.schedmd.com/cgroup_v2.html#limitations [3] https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html

3 4

2025

2024

slurm-users ----- 2025 ----- July 2025 June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024

slurm-users