[slurm-users] sacct returns nothing after reboot
Roger Mason
rmason at mun.ca
Tue May 12 12:08:54 UTC 2020
Hello,
Yesterday I instituted job accounting via mysql on my (FreeBSD 11.3)
test cluster. The cluster consists of a machine running
slurmctld+slurmdbd and two running slurmd (slurm version 20.02.1).
After experiencing a slurmdbd core dump when using mysql-5.7.30
(reported on this list on May 5) I installed 5.7.28 instead.
Before yesterday I had no accounting of any kind. I had observed the
behaviour that the job id's always restarted at 2 after a reboot. After
installing mysql and setting it up I ran a few test jobs and verified
that sacct listed them: all seemed well.
This morning upon re-booting the machine running slurmctld+slurmdbd
sacct returns nothing:
rmason sacct --allusers
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
so it seems that yesterday's jobs have been forgotten.
When I connect to mysql as the user owning the databases it seems there
is information present. For example,
select * from imacbeastie_job_table;
returns information about the test jobs I ran yesterday.
As a further test I just ran another test job:
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 imac test rmason R 0:03 1 patchperthite
I notice that the jobid starts at 2 (I ran 5 or 6 test jobs yesterday).
sacct now returns information:
sacct --allusers
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2 test imac 2 COMPLETED 0:0
2.batch batch 2 COMPLETED 0:0
2.0 hostname 1 COMPLETED 0:0
2.1 sleep 1 COMPLETED 0:0
but only for the test job I ran today.
I appreciate any help in getting accounting to work properly.
Thanks,
Roger
More information about the slurm-users
mailing list