[slurm-users] speed / efficiency of sacct vs. scontrol

Loris Bennett loris.bennett at fu-berlin.de
Thu Feb 23 12:40:30 UTC 2023


Hi David,

David Laehnemann <david.laehnemann at hhu.de> writes:

> Dear Slurm users and developers,
>
> TL;DR:
> Do any of you know if `scontrol` status checks of jobs are always
> expected to be quicker than `sacct` job status checks? Do you have any
> comparative timings between the two commands?
> And consequently, would using `scontrol` thus be the better default
> option (as opposed to `sacct`) for repeated job status checks by a
> workflow management system?

I am probably being a bit naive, but I would have thought that the batch
system should just be able start your jobs when resources become
available.  Why do you need to check the status of jobs?  I would tend
to think that it is not something users should be doing.

Cheers,

Loris

> And here's the long version with background infos and linkouts:
>
> I have recently started using a Slurm cluster and am a regular user of
> the workflow management system snakemake (
> https://snakemake.readthedocs.io/en/latest/). This workflow manager
> recently integrated support for running analysis workflows pretty
> seamlessly on Slurm clusters. It takes care of managing all job
> dependecies and handles the submission of jobs according to your global
> (and job-specific) resource configurations.
>
> One little hiccup when starting to use the snakemake-Slurm combination
> was a snakemake-internal rate-limitation for checking job statuses. You
> can find the full story here:
> https://github.com/snakemake/snakemake/pull/2136
>
> For debugging this, I obtained timings on `sacct` and `scontrol`, with
> `scontrol` consistently about 2.5x quicker in returning the job status
> when compared to `sacct`. Timings are recorded here:
> https://github.com/snakemake/snakemake/blob/b91651d5ea2314b954a3b4b096d7f327ce743b94/snakemake/scheduler.py#L199-L210
>
> However, currently `sacct` is used for regularly checking the status of
> submitted jobs per default, and `scontrol` is only a fallback whenever
> `sacct` doesn't find the job (for example because it is not yet
> running). Now, I was wondering if switching the default to `scontrol`
> would make sense. Thus, I would like to ask:
>
> 1) Slurm users, whether they also have similar timings on different
> Slurm clusters and whether those confirm that `scontrol` is
> consistently quicker?
>
> 2) Slurm developers, whether `scontrol` is expected to be quicker from
> its implementation and whether using `scontrol` would also be the
> option that puts less strain on the scheduler in general?
>
> Many thanks and best regards,
> David
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin



More information about the slurm-users mailing list