[slurm-users] speed / efficiency of sacct vs. scontrol
David Laehnemann
david.laehnemann at hhu.de
Mon Feb 27 11:34:32 UTC 2023
Hi Chris, hi Sean,
thanks also (and thanks again) for chiming in.
Quick follow-up question:
Would `squeue` be a better fall-back command than `scontrol` from the
perspective of keeping `slurmctld` responsive? From what I can see in
the general overview of how slurm works (
https://slurm.schedmd.com/overview.html), both query `slurmctld`. But
would one be "better" than the other, as in generating less work for
`slurmctld`? Or will it roughly be an equivalent amount of work, so
that we can rather see which set of command-line arguments better suits
our needs?
Also, just as a quick heads-up: I am documenting your input by linking
to the mailing list archives, I hope that's alright for you?
https://github.com/snakemake/snakemake/pull/2136#issuecomment-1446170467
cheers,
david
On Sat, 2023-02-25 at 10:51 -0800, Chris Samuel wrote:
> On 23/2/23 2:55 am, David Laehnemann wrote:
>
> > And consequently, would using `scontrol` thus be the better default
> > option (as opposed to `sacct`) for repeated job status checks by a
> > workflow management system?
>
> Many others have commented on this, but use of scontrol in this way
> is
> really really bad because of the impact it has on slurmctld. This is
> because responding to the RPC (IIRC) requires taking read locks on
> internal data structures and on a large, busy system (like ours, we
> recently rolled over slurm job IDs back to 1 after ~6 years of
> operation
> and run at over 90% occupancy most of the time) this can really
> damage
> scheduling performance.
>
> We've had numerous occasions where we've had to track down users
> abusing
> scontrol in this way and redirect them to use sacct instead.
>
> We already use the cli filter abilities in Slurm to impose a form of
> rate limiting on RPCs from other commands, but unfortunately scontrol
> is
> not covered by that.
>
> All the best,
> Chris
More information about the slurm-users
mailing list