[slurm-users] ticking time bomb? launching too many jobs in parallel
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 27 08:22:38 UTC 2019
The performance of the slurmctld server depends strongly on the server
hardware on which it is running! This should be taken into account when
considering your question.
SchedMD recommends that the slurmctld server should have only a few, but
very fast CPU cores, in order to ensure the best responsiveness.
The file system for /var/spool/slurmctld/ should be mounted on the
fastest possible disks (SSD or NVMe if possible).
You should also read the Large Cluster Administration Guide at
Furthermore, it may perhaps be a good idea to have the MySQL database
server installed on a separate server so that it doesn't slow down the
On 8/27/19 9:45 AM, Guillaume Perrault Archambault wrote:
> Hi Paul,
> Thanks a lot for your suggestion.
> The cluster I'm using has thousands of users, so I'm doubtful the admins
> will change this setting just for me. But I'll mention it to the support
> team I'm working with.
> I was hoping more for something that can be done on the user end.
> Is there some way for the user to measure whether the scheduler is in
> RPC saturation? And then if it is, I could make sure my script doesn't
> launch too many jobs in parallel.
> Sorry if my question is too vague, I don't understand the backend of the
> SLURM scheduler too well, so my questions are using the limited
> terminology of a user.
> My concern is just to make sure that my scripts don't send out more
> commands (simultaneously) than the scheduler can handle.
> For example, as an extreme scenario, suppose a user forks off 1000
> sbatch commands in parallel, is that more than the scheduler can handle?
> As a user, how can I know whether it is?
More information about the slurm-users