[slurm-users] speed / efficiency of sacct vs. scontrol

Mon Feb 27 22:05:15 UTC 2023

Hi Brian,

thanks for your ideas. Follow-up questions, because further digging
through the docs didn't get me anywhere definitive on this:

IMHO, the true solution is that if a job's info NEEDS updated that 
> often, have the job itself report what it is doing (but NOT via
> slurm 
> commands). There are numerous ways to do that for most jobs.

Do you have any examples or suggestions of such ways without using
slurm commands?

> Perhaps there is some additional lines that could be added to the
> job 
> that would do a call to a snakemake API and report itself? Or maybe
> such 
> an API could be created/expanded.

One option that could work somehow, would be to use the `--wait` option
of the `sbatch` command that snakemake uses to submit jobs, by `
--wrap`ping the respective shell command. In addition, `sbatch` would
have to record "Job Accounting" info before exiting (it somehow does
implicitly in the log file, although I am not sure how and where the
printing of this accounting info is configured; so I am not sure if
this info will always be available in the logs or whether this depends
on a Slurm cluster's configuration). One could then have snakemake wait
for the process to finish, and only then parse the "Job Accounting"
info in the log file to determine what happened. But this means we do
not know `JobId`s of submitted jobs in the meantime, as the `JobId` is
what is usually returned by `sbatch` upon successful submission (when
`--wait` is not used). As a result, things like running an `scancel` on
all currently running jobs when we want to stop a snakemake run becomes
more difficult, because we don't have a list of `JobId`s of currently
active jobs. Although, a single run-specific `name` for all jobs of a
run (as suggested by Sean) might help, as `scancel` seems to allow the
use of job names.

But as one can hopefully see, there are no simple solutions. And to me,
the documentation is not that easy to parse, especially if you are not
already familiar with the terminology, and I have not really found any
best practices regarding the best ways to query for or somehow
otherwise determine job status (which is not to say they don't exist,
but at least they don't seem easy to find -- pointers are welcome).
I'll try to document whatever solution I come up with as best as I can,
so that others can hopefully reuse as much as they can in their
contexts. But maybe some publicly available best practices (and no-gos) 
for slurm cluster users would be a useful resource that cluster admins
can then point / link to.

cheers,
david

On Mon, 2023-02-27 at 06:53 -0800, Brian Andrus wrote:
> Sorry, I had to share that this is very much like "Are we there yet?"
> on 
> a road trip with kids :)
> 
> Slurm is trying to drive. Any communication to slurmctld will involve
> an 
> RPC call (sinfo, squeue, scontrol, etc). You can see how many with
> sinfo.
> Too many RPC calls will cause failures. Asking slurmdbd will not do
> that 
> to you. In fact, you could have a separate slurmdbd just for queries
> if 
> you wanted. This is why that was suggested as a better option.
> 
> So, even if you run 'squeue' once every few seconds, it would impact
> the 
> system. More so depending on the size of the system. We have had
> that 
> issue with users running 'watch squeue' and had to address it.
> 
> IMHO, the true solution is that if a job's info NEEDS updated that 
> often, have the job itself report what it is doing (but NOT via
> slurm 
> commands). There are numerous ways to do that for most jobs.
> 
> Perhaps there is some additional lines that could be added to the
> job 
> that would do a call to a snakemake API and report itself? Or maybe
> such 
> an API could be created/expanded.
> 
> Just a quick 2 cents (We may be up to a few dollars with all of those
> so 
> far).
> 
> Brian Andrus
> 
> 
> On 2/27/2023 4:24 AM, Ward Poelmans wrote:
> > On 24/02/2023 18:34, David Laehnemann wrote:
> > > Those queries then should not have to happen too often, although
> > > do you
> > > have any indication of a range for when you say "you still
> > > wouldn't
> > > want to query the status too frequently." Because I don't really,
> > > and
> > > would probably opt for some compromise of every 30 seconds or so.
> > 
> > I think this is exactly why hpc sys admins are sometimes not very 
> > happy about these tools. You're talking about 10000 of jobs on one 
> > hand yet you want fetch the status every 30 seconds? What is the
> > point 
> > of that other then overloading the scheduler?
> > 
> > We're telling your users not to query the slurm too often and
> > usually 
> > give 5 minutes as a good interval. You have to let slurm do it's
> > job. 
> > There is no point in querying in a loop every 30 seconds when
> > we're 
> > talking about large numbers of jobs.
> > 
> > 
> > Ward