[slurm-users] speed / efficiency of sacct vs. scontrol

Sean Maxwell stm at case.edu
Thu Feb 23 14:46:42 UTC 2023


Hi David,

On Thu, Feb 23, 2023 at 8:51 AM David Laehnemann <david.laehnemann at hhu.de>
wrote:

> Quick follow-up question: do you have any indication of the rate of job
> status checks via sacct that slurmdbd will gracefully handle (per
> second)? Or any suggestions how to roughly determine such a rate for a
> given cluster system?
>

I looked at your PR for context, and this line of snakemake looks
problematic (I know this isn't part of your PR, it is part of the original
code)
https://github.com/snakemake/snakemake/commit/a0f04bab08113196fe1616a621bd6bf20fc05688#diff-d1b47826c1fc35806df72508e2f5e7f1d0424f9b2f7b9124810b051f5fe97f1bL296
:

sacct_cmd = f"sacct -P -n --format=JobIdRaw,State -j {jobid}"

Since jobid is an int, this looks like snakmake will individually probe
each Slurm job it has launched. If snakemake was using batch logic to
gather status for all your running jobs with one call to sacct, then you
could probably set the interval low. But it looks like it is going to probe
each job individually by ID, so it will make as many RPC calls as their are
jobs in the pipeline when it is time to check the status.

I could be wrong, but this is how I evaluated the code without going
farther upstream.

Best,

-Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230223/34c39d67/attachment.htm>


More information about the slurm-users mailing list