[slurm-users] speed / efficiency of sacct vs. scontrol

Chris Samuel chris at csamuel.org
Sat Feb 25 18:51:23 UTC 2023


On 23/2/23 2:55 am, David Laehnemann wrote:

> And consequently, would using `scontrol` thus be the better default
> option (as opposed to `sacct`) for repeated job status checks by a
> workflow management system?

Many others have commented on this, but use of scontrol in this way is 
really really bad because of the impact it has on slurmctld. This is 
because responding to the RPC (IIRC) requires taking read locks on 
internal data structures and on a large, busy system (like ours, we 
recently rolled over slurm job IDs back to 1 after ~6 years of operation 
and run at over 90% occupancy most of the time) this can really damage 
scheduling performance.

We've had numerous occasions where we've had to track down users abusing 
scontrol in this way and redirect them to use sacct instead.

We already use the cli filter abilities in Slurm to impose a form of 
rate limiting on RPCs from other commands, but unfortunately scontrol is 
not covered by that.

All the best,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




More information about the slurm-users mailing list