[slurm-users] speed / efficiency of sacct vs. scontrol

David Laehnemann david.laehnemann at hhu.de
Thu Feb 23 10:55:18 UTC 2023


Dear Slurm users and developers,

TL;DR:
Do any of you know if `scontrol` status checks of jobs are always
expected to be quicker than `sacct` job status checks? Do you have any
comparative timings between the two commands?
And consequently, would using `scontrol` thus be the better default
option (as opposed to `sacct`) for repeated job status checks by a
workflow management system?


And here's the long version with background infos and linkouts:

I have recently started using a Slurm cluster and am a regular user of
the workflow management system snakemake (
https://snakemake.readthedocs.io/en/latest/). This workflow manager
recently integrated support for running analysis workflows pretty
seamlessly on Slurm clusters. It takes care of managing all job
dependecies and handles the submission of jobs according to your global
(and job-specific) resource configurations.

One little hiccup when starting to use the snakemake-Slurm combination
was a snakemake-internal rate-limitation for checking job statuses. You
can find the full story here:
https://github.com/snakemake/snakemake/pull/2136

For debugging this, I obtained timings on `sacct` and `scontrol`, with
`scontrol` consistently about 2.5x quicker in returning the job status
when compared to `sacct`. Timings are recorded here:
https://github.com/snakemake/snakemake/blob/b91651d5ea2314b954a3b4b096d7f327ce743b94/snakemake/scheduler.py#L199-L210

However, currently `sacct` is used for regularly checking the status of
submitted jobs per default, and `scontrol` is only a fallback whenever
`sacct` doesn't find the job (for example because it is not yet
running). Now, I was wondering if switching the default to `scontrol`
would make sense. Thus, I would like to ask:

1) Slurm users, whether they also have similar timings on different
Slurm clusters and whether those confirm that `scontrol` is
consistently quicker?

2) Slurm developers, whether `scontrol` is expected to be quicker from
its implementation and whether using `scontrol` would also be the
option that puts less strain on the scheduler in general?

Many thanks and best regards,
David




More information about the slurm-users mailing list