<div dir="ltr">Thanks Ole for giving so much thought into my question. I'll pass a long these suggestions. Unfortunately as a user there's not a whole lot I can do about the choice of hardware.<div><br></div><div>Thanks for the link to the guide, I'll have a look at it. Even as a user it's helpful to be well informed on the admin side :)</div><div><br></div><div>Regards,</div><div>Guillaume.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 27, 2019 at 4:26 AM Ole Holm Nielsen <<a href="mailto:Ole.H.Nielsen@fysik.dtu.dk">Ole.H.Nielsen@fysik.dtu.dk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Guillaume,<br>
<br>
The performance of the slurmctld server depends strongly on the server <br>
hardware on which it is running! This should be taken into account when <br>
considering your question.<br>
<br>
SchedMD recommends that the slurmctld server should have only a few, but <br>
very fast CPU cores, in order to ensure the best responsiveness.<br>
<br>
The file system for /var/spool/slurmctld/ should be mounted on the <br>
fastest possible disks (SSD or NVMe if possible).<br>
<br>
You should also read the Large Cluster Administration Guide at <br>
<a href="https://slurm.schedmd.com/big_sys.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/big_sys.html</a><br>
<br>
Furthermore, it may perhaps be a good idea to have the MySQL database <br>
server installed on a separate server so that it doesn't slow down the <br>
slurmctld.<br>
<br>
Best regards,<br>
Ole<br>
<br>
On 8/27/19 9:45 AM, Guillaume Perrault Archambault wrote:<br>
> Hi Paul,<br>
> <br>
> Thanks a lot for your suggestion.<br>
> <br>
> The cluster I'm using has thousands of users, so I'm doubtful the admins <br>
> will change this setting just for me. But I'll mention it to the support <br>
> team I'm working with.<br>
> <br>
> I was hoping more for something that can be done on the user end.<br>
> <br>
> Is there some way for the user to measure whether the scheduler is in <br>
> RPC saturation? And then if it is, I could make sure my script doesn't <br>
> launch too many jobs in parallel.<br>
> <br>
> Sorry if my question is too vague, I don't understand the backend of the <br>
> SLURM scheduler too well, so my questions are using the limited <br>
> terminology of a user.<br>
> <br>
> My concern is just to make sure that my scripts don't send out more <br>
> commands (simultaneously) than the scheduler can handle.<br>
> <br>
> For example, as an extreme scenario, suppose a user forks off 1000 <br>
> sbatch commands in parallel, is that more than the scheduler can handle? <br>
> As a user, how can I know whether it is?<br>
<br>
</blockquote></div>