I am building a new small cluster on Rocky Linux 9 with Slurm 24.11.5.
Slurm was compiled with the default options except that we added --with pmix
We do use some additional complications - we use NFSv4 with AD
authentication (so we use auks), and we use SElinux in enforcing mode.
However, I do not currently see any evidence that those relate to the
problem that we have; though note that the slurm account is an AD account.
All relevant AD accounts have the RFC 2307 attributes (uid, gid etc.)
configured.
I had also configured pam_slurm_adopt but as that was causing issues I
backed out the changes that enable it. It was too hard to troubleshoot
without being able to login to the test compute node easily.
The issue I am having in testing is that whatever job I launch, when it
terminates the extern job step does not terminate.
These are the more relevant slurm.conf settings:
Epilog=/etc/slurm/epilog.sh
JobAcctGatherType=jobacct_gather/cgroup
JobCompType=jobcomp/none
KillOnBadExit=1
KillWait=30
ProctrackType=proctrack/cgroup
Prolog=/etc/slurm/prolog.sh
PrologFlags=Alloc,Contain
SelectTypeParameters=CR_Core_Memory
SelectType=select/cons_tres
SlurmUser=slurm
TaskPlugin=task/affinity,task/cgroup
TaskProlog=/etc/slurm/taskprolog.sh
The slurmd.log on the compute node for the job shows a sequence like this
(with -vv debug enabled):
[2025-06-26T14:57:23.272] select/cons_tres: init: select/cons_tres loaded
[2025-06-26T14:57:23.272] select/linear: init: Linear node selection plugin
loaded with argument 20
[2025-06-26T14:57:23.272] cred/munge: init: Munge credential signature
plugin loaded
[2025-06-26T14:57:23.272] [34.extern] debug: auth/munge: init: loaded
[2025-06-26T14:57:23.273] [34.extern] debug: Reading cgroup.conf file
/etc/slurm/cgroup.conf
[2025-06-26T14:57:23.279] [34.extern] debug: cgroup/v2: init: Cgroup v2
plugin loaded
[2025-06-26T14:57:23.290] [34.extern] debug: hash/k12: init: init:
KangarooTwelve hash plugin loaded
[2025-06-26T14:57:23.291] [34.extern] debug: CPUs:40 Boards:1 Sockets:2
CoresPerSocket:20 ThreadsPerCore:1
[2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: core
enforcement enabled
[2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup:
task_cgroup_memory_init: task/cgroup/memory: TotCfgRealMem:191587M
allowed:100%(enforced), swap:0%(enforced), max:100%(191587M)
max+swap:100%(383174M) min:30M
[2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: memory
enforcement enabled
[2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: device
enforcement enabled
[2025-06-26T14:57:23.291] [34.extern] debug: task/cgroup: init: Tasks
containment cgroup plugin loaded
[2025-06-26T14:57:23.291] [34.extern] task/affinity: init: task affinity
plugin loaded with CPU mask 0xffffffffff
[2025-06-26T14:57:23.291] [34.extern] debug: jobacct_gather/cgroup: init:
Job accounting gather cgroup plugin loaded
[2025-06-26T14:57:23.291] [34.extern] topology/default: init: topology
Default plugin loaded
[2025-06-26T14:57:23.292] [34.extern] debug: gpu/generic: init: init: GPU
Generic plugin loaded
[2025-06-26T14:57:23.293] [34.extern] debug: Setting slurmstepd(2666071)
oom_score_adj to -1000
[2025-06-26T14:57:23.293] [34.extern] debug: Message thread started pid =
2666071
[2025-06-26T14:57:23.294] [34.extern] debug: spank: opening plugin stack
/etc/slurm/plugstack.conf
[2025-06-26T14:57:23.294] [34.extern] debug: /etc/slurm/plugstack.conf: 1:
include "/etc/slurm/plugstack.conf.d/*.conf"
[2025-06-26T14:57:23.294] [34.extern] debug: spank: opening plugin stack
/etc/slurm/plugstack.conf.d/auks.conf
[2025-06-26T14:57:23.298] [34.extern] debug: spank:
/etc/slurm/plugstack.conf.d/auks.conf:57: Loaded plugin auks.so
[2025-06-26T14:57:23.298] [34.extern] debug: SPANK: appending plugin option
"auks"
[2025-06-26T14:57:23.352] [34.extern] spank-auks: new unique ccache is
KCM:71433:49222
<<<< This UID 71433 is the user running the job
[2025-06-26T14:57:23.364] [34.extern] spank-auks: user '71433' cred stored
in ccache KCM:71433:49222
[2025-06-26T14:57:23.380] [34.extern] debug: task/cgroup:
task_cgroup_cpuset_create: job abstract cores are '0'
[2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup:
task_cgroup_cpuset_create: step abstract cores are '0'
[2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup:
task_cgroup_cpuset_create: job physical CPUs are '0'
[2025-06-26T14:57:23.381] [34.extern] debug: task/cgroup:
task_cgroup_cpuset_create: step physical CPUs are '0'
[2025-06-26T14:57:23.381] [34.extern] task/cgroup: _memcg_initialize: job:
alloc=1024MB mem.limit=1024MB memsw.limit=1024MB
job_swappiness=18446744073709551614
[2025-06-26T14:57:23.381] [34.extern] task/cgroup: _memcg_initialize: step:
alloc=1024MB mem.limit=1024MB memsw.limit=1024MB
job_swappiness=18446744073709551614
[2025-06-26T14:57:23.385] [34.extern] debug: close_slurmd_conn: sending 0:
No error
[2025-06-26T14:57:24.371] launch task StepId=34.interactive request from
UID:71433 GID:70668 HOST:172.17.11.22 PORT:46948
[2025-06-26T14:57:24.372] task/affinity: lllp_distribution: JobId=34 manual
binding: mask_cpu,one_thread
[2025-06-26T14:57:24.372] debug: Waiting for job 34's prolog to complete
[2025-06-26T14:57:24.372] debug: Finished wait for job 34's prolog to
complete
[2025-06-26T14:57:24.377] select/cons_tres: init: select/cons_tres loaded
[2025-06-26T14:57:24.377] select/linear: init: Linear node selection plugin
loaded with argument 20
[2025-06-26T14:57:24.377] cred/munge: init: Munge credential signature
plugin loaded
[2025-06-26T14:57:24.377] [34.interactive] debug: auth/munge: init: loaded
[2025-06-26T14:57:24.379] [34.interactive] debug: Reading cgroup.conf file
/etc/slurm/cgroup.conf
[2025-06-26T14:57:24.384] [34.interactive] debug: cgroup/v2: init: Cgroup
v2 plugin loaded
[2025-06-26T14:57:24.403] [34.interactive] debug: hash/k12: init: init:
KangarooTwelve hash plugin loaded
[2025-06-26T14:57:24.404] [34.interactive] debug: CPUs:40 Boards:1
Sockets:2 CoresPerSocket:20 ThreadsPerCore:1
[2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: core
enforcement enabled
[2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup:
task_cgroup_memory_init: task/cgroup/memory: TotCfgRealMem:191587M
allowed:100%(enforced), swap:0%(enforced), max:100%(191587M)
max+swap:100%(383174M) min:30M
[2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: memory
enforcement enabled
[2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: device
enforcement enabled
[2025-06-26T14:57:24.404] [34.interactive] debug: task/cgroup: init: Tasks
containment cgroup plugin loaded
[2025-06-26T14:57:24.404] [34.interactive] task/affinity: init: task
affinity plugin loaded with CPU mask 0xffffffffff
[2025-06-26T14:57:24.404] [34.interactive] debug: jobacct_gather/cgroup:
init: Job accounting gather cgroup plugin loaded
[2025-06-26T14:57:24.404] [34.interactive] topology/default: init: topology
Default plugin loaded
[2025-06-26T14:57:24.404] [34.interactive] debug: gpu/generic: init: init:
GPU Generic plugin loaded
[2025-06-26T14:57:24.406] [34.interactive] debug: close_slurmd_conn:
sending 0: No error
[2025-06-26T14:57:24.406] [34.interactive] debug: Message thread started
pid = 2666089
[2025-06-26T14:57:24.406] [34.interactive] debug: Setting
slurmstepd(2666089) oom_score_adj to -1000
[2025-06-26T14:57:24.407] [34.interactive] debug: spank: opening plugin
stack /etc/slurm/plugstack.conf
[2025-06-26T14:57:24.407] [34.interactive] debug:
/etc/slurm/plugstack.conf: 1: include "/etc/slurm/plugstack.conf.d/*.conf"
[2025-06-26T14:57:24.407] [34.interactive] debug: spank: opening plugin
stack /etc/slurm/plugstack.conf.d/auks.conf
[2025-06-26T14:57:24.411] [34.interactive] debug: spank:
/etc/slurm/plugstack.conf.d/auks.conf:57: Loaded plugin auks.so
[2025-06-26T14:57:24.411] [34.interactive] debug: SPANK: appending plugin
option "auks"
[2025-06-26T14:57:24.461] [34.interactive] spank-auks: new unique ccache is
KCM:71433:44156
[2025-06-26T14:57:24.467] [34.interactive] spank-auks: user '71433' cred
stored in ccache KCM:71433:44156
[2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup:
task_cgroup_cpuset_create: job abstract cores are '0'
[2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup:
task_cgroup_cpuset_create: step abstract cores are '0'
[2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup:
task_cgroup_cpuset_create: job physical CPUs are '0'
[2025-06-26T14:57:24.480] [34.interactive] debug: task/cgroup:
task_cgroup_cpuset_create: step physical CPUs are '0'
[2025-06-26T14:57:24.481] [34.interactive] task/cgroup: _memcg_initialize:
job: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB
job_swappiness=18446744073709551614
[2025-06-26T14:57:24.481] [34.interactive] task/cgroup: _memcg_initialize:
step: alloc=1024MB mem.limit=1024MB memsw.limit=1024MB
job_swappiness=18446744073709551614
[2025-06-26T14:57:24.501] [34.interactive] warning: restricted to a subset
of cpus
[2025-06-26T14:57:24.503] [34.interactive] debug: stdin uses a pty object
[2025-06-26T14:57:24.504] [34.interactive] debug: init pty size 48:324
[2025-06-26T14:57:24.506] [34.interactive] debug levels are stderr='error',
logfile='debug', syslog='quiet'
[2025-06-26T14:57:24.507] [34.interactive] debug: IO handler started
pid=2666089
[2025-06-26T14:57:24.517] [34.interactive] spank-auks: credential renewer
launched (pid=2666107)
[2025-06-26T14:57:24.517] [34.interactive] starting 1 tasks
[2025-06-26T14:57:24.517] [34.interactive] task 0 (2666108) started
2025-06-26T14:57:24
[2025-06-26T14:57:24.537] [34.interactive] debug: task/affinity:
task_p_pre_launch: affinity StepId=34.interactive, task:0
bind:mask_cpu,one_thread
[2025-06-26T14:57:24.537] [34.interactive] debug: [job 34] attempting to
run slurm task_prolog [/etc/slurm/taskprolog.sh]
[2025-06-26T14:57:24.537] [34.interactive] debug: Sending launch resp rc=0
[2025-06-26T14:57:24.563] [34.interactive] debug: export
name:TMPDIR:val:/local/scratch/34:
[2025-06-26T14:59:42.726] [34.interactive] task 0 (2666108) exited with exit
code 0.
<<<< Here the interactive job was closed
[2025-06-26T14:59:42.727] [34.interactive] spank-auks: all tasks exited,
killing credential renewer (pid=2666107)
[2025-06-26T14:59:42.729] [34.interactive] debug: task/affinity:
task_p_post_term: affinity StepId=34.interactive, task 0
[2025-06-26T14:59:42.729] [34.interactive] debug: signaling condition
[2025-06-26T14:59:42.729] [34.interactive] debug: jobacct_gather/cgroup:
fini: Job accounting gather cgroup plugin unloaded
[2025-06-26T14:59:42.729] [34.interactive] debug: Waiting for IO
[2025-06-26T14:59:42.729] [34.interactive] debug: Closing debug channel
[2025-06-26T14:59:42.729] [34.interactive] debug: IO handler exited, rc=0
[2025-06-26T14:59:42.729] [34.interactive] debug: task/cgroup: fini: Tasks
containment cgroup plugin unloaded
[2025-06-26T14:59:42.731] [34.interactive] debug: slurm_recv_timeout at 0
of 4, recv zero bytes
[2025-06-26T14:59:42.738] [34.interactive] spank-auks: Destroyed ccache
KCM:71433:44156
[2025-06-26T14:59:42.740] debug: _rpc_terminate_job: uid = 71755 JobId=34
<<<< That uid 71755 is for the slurm account which does not run any daemons
on a compute node
[2025-06-26T14:59:42.740] debug: credential for job 34 revoked
[2025-06-26T14:59:42.742] [34.extern] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2025-06-26T14:59:42.742] [34.extern] debug: _handle_signal_container for
StepId=34.extern uid=71755 signal=18 flag=0x0
[2025-06-26T14:59:42.742] [34.extern] Sent signal 18 to StepId=34.extern
[2025-06-26T14:59:42.742] [34.interactive] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2025-06-26T14:59:42.742] [34.interactive] debug: _handle_signal_container
for StepId=34.interactive uid=71755 signal=18 flag=0x0
[2025-06-26T14:59:42.742] [34.interactive] Sent signal 18 to
StepId=34.interactive
[2025-06-26T14:59:42.743] [34.extern] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2025-06-26T14:59:42.743] [34.extern] debug: _handle_signal_container for
StepId=34.extern uid=71755 signal=15 flag=0x0
[2025-06-26T14:59:42.743] [34.extern] Sent signal 15 to StepId=34.extern
[2025-06-26T14:59:42.743] [34.interactive] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2025-06-26T14:59:42.743] [34.interactive] debug: _handle_signal_container
for StepId=34.interactive uid=71755 signal=15 flag=0x0
[2025-06-26T14:59:42.743] [34.interactive] Sent signal 15 to
StepId=34.interactive
[2025-06-26T14:59:42.744] [34.interactive] debug: Handling REQUEST_STATE
[2025-06-26T14:59:42.744] [34.interactive] debug: Message thread exited
[2025-06-26T14:59:42.744] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:42.763] [34.interactive] done with step
[2025-06-26T14:59:42.766] [34.extern] debug: signaling condition
[2025-06-26T14:59:42.766] [34.extern] debug: task/affinity:
task_p_post_term: affinity StepId=34.extern, task 0
[2025-06-26T14:59:42.766] [34.extern] debug: jobacct_gather/cgroup: fini:
Job accounting gather cgroup plugin unloaded
[2025-06-26T14:59:42.766] [34.extern] debug: task/cgroup: fini: Tasks
containment cgroup plugin unloaded
[2025-06-26T14:59:42.766] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:42.766] [34.extern] debug: Terminate signal (SIGTERM)
received
[2025-06-26T14:59:42.768] [34.extern] error: setgroups: Operation not
permitted
[2025-06-26T14:59:42.768] [34.extern] error: _shutdown_x11_forward: Unable
to drop privileges
[2025-06-26T14:59:42.771] [34.extern] spank-auks: Destroyed ccache
KCM:71433:49222
[2025-06-26T14:59:42.817] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:42.918] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:43.419] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:44.420] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:45.420] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:46.421] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:47.422] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:48.423] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:49.424] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:50.425] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:51.426] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:52.426] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T14:59:53.427] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:03.428] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:12.429] [34.extern] debug: Handling
REQUEST_STEP_TERMINATE
[2025-06-26T15:00:12.429] [34.extern] debug: _handle_terminate for
StepId=34.extern uid=0
[2025-06-26T15:00:12.429] [34.extern] Sent SIGKILL signal to
StepId=34.extern
[2025-06-26T15:00:12.429] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:12.450] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:12.501] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:12.602] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:13.102] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:14.103] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:14.104] [34.extern] debug: Handling
REQUEST_STEP_TERMINATE
[2025-06-26T15:00:14.104] [34.extern] debug: _handle_terminate for
StepId=34.extern uid=0
[2025-06-26T15:00:14.104] [34.extern] Sent SIGKILL signal to
StepId=34.extern
[2025-06-26T15:00:15.105] [34.extern] debug: Handling REQUEST_STATE
[2025-06-26T15:00:15.105] [34.extern] debug: Handling
REQUEST_STEP_TERMINATE
.
.
We are left with a process "slurmstepd: [34.extern]" running, and a matching
socket file for job 34 in /var/spool/slurmd; the job shows state 'CG'
(Completing). I can clean up by issuing a kill -9 to the slurmstepd
'extern' job on the compute node. I do not know a way to get any logging
output from that job step, but it has a file descriptor open to
/var/log/slurm/slurmd.log (which is what we configured) so I assume that it
is writing some of the above log.
In some runs I also see this just as the main task exits (but from the .0
job step, not from .extern) :
[2025-06-27T21:00:47.265] [39.0] error: common_file_write_content: unable to
open
'/sys/fs/cgroup/system.slice/slurmstepd.scope/job_39/step_0/user/cgroup.free
ze' for writing: Permission denied
I can see that between reboots, the directories
'/sys/fs/cgroup/system.slice/slurmstepd.scope/job_NN' remain, with the
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_NN/step_extern folder. The
permissions of the files referred to as 'permission denied' seem fine to me:
# ls -lhZ
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_42/step_extern/user/cgroup.
freeze
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jun 27 21:46
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_42/step_extern/user/cgroup.
freeze
There are no SElinux alerts on the system. I am not sure whether the
error messages about setgroups and _shutdown_x11_forward are actually the
problem, or just something else being reported. The only system that I have
to compare with is running Slurm 19.05 on CentOS 7 and is rather different.
I would be interested to know if anyone else has had problems with extern
job steps not shutting down.
William