weird sacct behavior? - slurm-users

10 Apr 2025


      Hi everyone,
I am currently stuck with an sacct issue and would appreciate any 
help/hints/ideas:
My users cannot retrieve job data from their currently running jobs 
through sacct anymore. Running sacct -a as root also reproduces this 
issue: It does not show running jobs, but both sacct -j <JobID> and 
squeue -j <JobID> do. AFAICT, this is not intended behavior (?). Also 
including longer time windows witch sacct -S ... -E did not help.
root@slurmmaster:~# sacct -a | grep 154415 # this returns nothing
root@slurmmaster:~# sacct -j 154415
JobID           JobName  Partition    Account  AllocCPUS      State 
ExitCode
------------ ---------- ---------- ---------- ---------- ---------- 
--------
154415       allocation               primevo          0    PENDING 
0:0
154415.batch      batch               primevo          2    RUNNING 
0:0
154415.exte+     extern               primevo          2    RUNNING 
0:0
root@slurmmaster:~# squeue -j 154415
              JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
             154415  standard genedrop username  R       1:31      1 hpc020
Also, possibly related, we had a slurmdbd crash before this changed.
We run Ubuntu Server 24.04 LTS with Slurm 24.05.4, using a MariaDB 
accounting database hosted on the same machine as the Slurm controller.
Does anyone here have any ideas?
Best,
Pierre
-- 
Pierre Abele, M.Sc.

HPC Administrator
Max-Planck-Institute for Evolutionary Anthropology
Department of Primate Behavior and Evolution

Deutscher Platz 6
04103 Leipzig

Room: U2.80
E-Mail: pierre_abele@eva.mpg.de
Phone: +49 (0) 341 3550 245