[slurm-users] speed / efficiency of sacct vs. scontrol
Sean Maxwell
stm at case.edu
Thu Feb 23 12:55:19 UTC 2023
Hi David,
scontrol - interacts with slurmctld using RPC, so it is faster, but
requests put load on the scheduler itself.
sacct - interacts with slurmdbd, so it doesn't place additional load on the
scheduler.
There is a balance to reach, but the scontrol approach is riskier and can
start to interfere with the cluster operation if used incorrectly.
Best,
-Sean
On Thu, Feb 23, 2023 at 5:59 AM David Laehnemann <david.laehnemann at hhu.de>
wrote:
> Dear Slurm users and developers,
>
> TL;DR:
> Do any of you know if `scontrol` status checks of jobs are always
> expected to be quicker than `sacct` job status checks? Do you have any
> comparative timings between the two commands?
> And consequently, would using `scontrol` thus be the better default
> option (as opposed to `sacct`) for repeated job status checks by a
> workflow management system?
>
>
> And here's the long version with background infos and linkouts:
>
> I have recently started using a Slurm cluster and am a regular user of
> the workflow management system snakemake (
> https://snakemake.readthedocs.io/en/latest/). This workflow manager
> recently integrated support for running analysis workflows pretty
> seamlessly on Slurm clusters. It takes care of managing all job
> dependecies and handles the submission of jobs according to your global
> (and job-specific) resource configurations.
>
> One little hiccup when starting to use the snakemake-Slurm combination
> was a snakemake-internal rate-limitation for checking job statuses. You
> can find the full story here:
> https://github.com/snakemake/snakemake/pull/2136
>
> For debugging this, I obtained timings on `sacct` and `scontrol`, with
> `scontrol` consistently about 2.5x quicker in returning the job status
> when compared to `sacct`. Timings are recorded here:
>
> https://github.com/snakemake/snakemake/blob/b91651d5ea2314b954a3b4b096d7f327ce743b94/snakemake/scheduler.py#L199-L210
>
> However, currently `sacct` is used for regularly checking the status of
> submitted jobs per default, and `scontrol` is only a fallback whenever
> `sacct` doesn't find the job (for example because it is not yet
> running). Now, I was wondering if switching the default to `scontrol`
> would make sense. Thus, I would like to ask:
>
> 1) Slurm users, whether they also have similar timings on different
> Slurm clusters and whether those confirm that `scontrol` is
> consistently quicker?
>
> 2) Slurm developers, whether `scontrol` is expected to be quicker from
> its implementation and whether using `scontrol` would also be the
> option that puts less strain on the scheduler in general?
>
> Many thanks and best regards,
> David
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230223/cfef3af7/attachment.htm>
More information about the slurm-users
mailing list