[slurm-users] Resource LImits

Jason Simms jsimms1 at swarthmore.edu
Thu Apr 20 18:11:10 UTC 2023


Hello Ole and Hoot,

First, Hoot, thank you for your question. I've managed Slurm for a few
years now and still feel like I don't have a great understanding about
managing or limiting resources.

Ole, thanks for your continued support of the user community with your
documentation. I do wish not only that more of your information were
contained within the official docs, but also that there were even clearer
discussions around certain topics.

As an example, you write that "It is important to configure slurm.conf so
that the locked memory limit isn’t propagated to the batch jobs" by
setting PropagateResourceLimitsExcept=MEMLOCK. It's unclear to me whether
you are suggesting that literally everyone should have that set, or whether
it only applies to certain configurations. We don't have it set, for
instance, but we've not run into trouble with jobs failing due to locked
memory errors.

Then, in the official docs, to which you link, it says that "it may also be
desirable to lock the slurmd daemon's memory to help ensure that it keeps
responding if memory swapping begins" by creating /etc/sysconfig/slurm
containing the line SLURMD_OPTIONS="-M". Would there ever be a reason *not*
to include that? That is, I can't think it would ever be desirable for
slurmd to stop responding. So is that another "universal" recommendation, I
wonder?

It may be me talking as a new-ish user, but I would find a concise document
laying out common or useful configuration options to be presented when
setting up or reconfiguring Slurm. I'm certain I have inefficient or
missing options that I should have.

Warmest regards,
Jason

On Thu, Apr 20, 2023 at 2:11 AM Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
wrote:

> Hi Hoot,
>
> On 4/20/23 00:15, Hoot Thompson wrote:
> > Is there a ‘how to’ or recipe document for setting up and enforcing
> resource limits? I can establish accounts, users, and set limits but
> 'current value' is not incrementing after running jobs.
>
> I have written about resource limits in this Wiki page:
>
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits
>
> IHTH,
> Ole
>
>

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230420/d3a41515/attachment.htm>


More information about the slurm-users mailing list