<div dir="ltr"><div>Hi David,</div><div><br></div><div>scontrol - interacts with slurmctld using RPC, so it is faster, but requests put load on the scheduler itself.</div><div>sacct - interacts with slurmdbd, so it doesn't place additional load on the scheduler.</div><div><br></div><div>There is a balance to reach, but the scontrol approach is riskier and can start to interfere with the cluster operation if used incorrectly.</div><div><br></div><div>Best,<br></div><div><br></div><div>-Sean<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 23, 2023 at 5:59 AM David Laehnemann <<a href="mailto:david.laehnemann@hhu.de">david.laehnemann@hhu.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Slurm users and developers,<br>

<br>

TL;DR:<br>

Do any of you know if `scontrol` status checks of jobs are always<br>

expected to be quicker than `sacct` job status checks? Do you have any<br>

comparative timings between the two commands?<br>

And consequently, would using `scontrol` thus be the better default<br>

option (as opposed to `sacct`) for repeated job status checks by a<br>

workflow management system?<br>

<br>

<br>

And here's the long version with background infos and linkouts:<br>

<br>

I have recently started using a Slurm cluster and am a regular user of<br>

the workflow management system snakemake (<br>

<a href="https://snakemake.readthedocs.io/en/latest/" rel="noreferrer" target="_blank">https://snakemake.readthedocs.io/en/latest/</a>). This workflow manager<br>

recently integrated support for running analysis workflows pretty<br>

seamlessly on Slurm clusters. It takes care of managing all job<br>

dependecies and handles the submission of jobs according to your global<br>

(and job-specific) resource configurations.<br>

<br>

One little hiccup when starting to use the snakemake-Slurm combination<br>

was a snakemake-internal rate-limitation for checking job statuses. You<br>

can find the full story here:<br>

<a href="https://github.com/snakemake/snakemake/pull/2136" rel="noreferrer" target="_blank">https://github.com/snakemake/snakemake/pull/2136</a><br>

<br>

For debugging this, I obtained timings on `sacct` and `scontrol`, with<br>

`scontrol` consistently about 2.5x quicker in returning the job status<br>

when compared to `sacct`. Timings are recorded here:<br>

<a href="https://github.com/snakemake/snakemake/blob/b91651d5ea2314b954a3b4b096d7f327ce743b94/snakemake/scheduler.py#L199-L210" rel="noreferrer" target="_blank">https://github.com/snakemake/snakemake/blob/b91651d5ea2314b954a3b4b096d7f327ce743b94/snakemake/scheduler.py#L199-L210</a><br>

<br>

However, currently `sacct` is used for regularly checking the status of<br>

submitted jobs per default, and `scontrol` is only a fallback whenever<br>

`sacct` doesn't find the job (for example because it is not yet<br>

running). Now, I was wondering if switching the default to `scontrol`<br>

would make sense. Thus, I would like to ask:<br>

<br>

1) Slurm users, whether they also have similar timings on different<br>

Slurm clusters and whether those confirm that `scontrol` is<br>

consistently quicker?<br>

<br>

2) Slurm developers, whether `scontrol` is expected to be quicker from<br>

its implementation and whether using `scontrol` would also be the<br>

option that puts less strain on the scheduler in general?<br>

<br>

Many thanks and best regards,<br>

David<br>

<br>

<br>

</blockquote></div>