<div dir="ltr">Thanks Ole for giving so much thought into my question. I'll pass a long these suggestions. Unfortunately as a user there's not a whole lot I can do about the choice of hardware.<div><br></div><div>Thanks for the link to the guide, I'll have a look at it. Even as a user it's helpful to be well informed on the admin side :)</div><div><br></div><div>Regards,</div><div>Guillaume.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 27, 2019 at 4:26 AM Ole Holm Nielsen <<a href="mailto:Ole.H.Nielsen@fysik.dtu.dk">Ole.H.Nielsen@fysik.dtu.dk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Guillaume,<br>

<br>

The performance of the slurmctld server depends strongly on the server <br>

hardware on which it is running!  This should be taken into account when <br>

considering your question.<br>

<br>

SchedMD recommends that the slurmctld server should have only a few, but <br>

very fast CPU cores, in order to ensure the best responsiveness.<br>

<br>

The file system for /var/spool/slurmctld/ should be mounted on the <br>

fastest possible disks (SSD or NVMe if possible).<br>

<br>

You should also read the Large Cluster Administration Guide at <br>

<a href="https://slurm.schedmd.com/big_sys.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/big_sys.html</a><br>

<br>

Furthermore, it may perhaps be a good idea to have the MySQL database <br>

server installed on a separate server so that it doesn't slow down the <br>

slurmctld.<br>

<br>

Best regards,<br>

Ole<br>

<br>

On 8/27/19 9:45 AM, Guillaume Perrault Archambault wrote:<br>

> Hi Paul,<br>

> <br>

> Thanks a lot for your suggestion.<br>

> <br>

> The cluster I'm using has thousands of users, so I'm doubtful the admins <br>

> will change this setting just for me. But I'll mention it to the support <br>

> team I'm working with.<br>

> <br>

> I was hoping more for something that can be done on the user end.<br>

> <br>

> Is there some way for the user to measure whether the scheduler is in <br>

> RPC saturation? And then if it is, I could make sure my script doesn't <br>

> launch too many jobs in parallel.<br>

> <br>

> Sorry if my question is too vague, I don't understand the backend of the <br>

> SLURM scheduler too well, so my questions are using the limited <br>

> terminology of a user.<br>

> <br>

> My concern is just to make sure that my scripts don't send out more <br>

> commands (simultaneously) than the scheduler can handle.<br>

> <br>

> For example, as an extreme scenario, suppose a user forks off 1000 <br>

> sbatch commands in parallel, is that more than the scheduler can handle? <br>

> As a user, how can I know whether it is?<br>

<br>

</blockquote></div>