[slurm-users] Resource LImits
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Apr 21 08:43:44 UTC 2023
Hi Jason,
On 4/20/23 20:11, Jason Simms wrote:
> Hello Ole and Hoot,
>
> First, Hoot, thank you for your question. I've managed Slurm for a few
> years now and still feel like I don't have a great understanding about
> managing or limiting resources.
>
> Ole, thanks for your continued support of the user community with your
> documentation. I do wish not only that more of your information were
> contained within the official docs, but also that there were even clearer
> discussions around certain topics.
>
> As an example, you write that "It is important to configure slurm.conf so
> that the locked memory limit isn’t propagated to the batch jobs" by
> setting PropagateResourceLimitsExcept=MEMLOCK. It's unclear to me whether
> you are suggesting that literally everyone should have that set, or
> whether it only applies to certain configurations. We don't have it set,
> for instance, but we've not run into trouble with jobs failing due to
> locked memory errors.
The link mentioned in the page hopefully explains it:
https://slurm.schedmd.com/faq.html#memlock
> Then, in the official docs, to which you link, it says that "it may also
> be desirable to lock the slurmd daemon's memory to help ensure that it
> keeps responding if memory swapping begins" by creating
> /etc/sysconfig/slurm containing the line SLURMD_OPTIONS="-M". Would there
> ever be a reason *not* to include that? That is, I can't think it would
> ever be desirable for slurmd to stop responding. So is that another
> "universal" recommendation, I wonder?
I'm not an expert on locking slurmd pages! The -M option is documented in
the slurmd manual page, and I probably read a thread long ago abut this on
the slurm-users mailing list discussing this. You could try it out in
your environment and see if all is well.
> It may be me talking as a new-ish user, but I would find a concise
> document laying out common or useful configuration options to be presented
> when setting up or reconfiguring Slurm. I'm certain I have inefficient or
> missing options that I should have.
IMHO, most sites have their own requirements and preferences, so I don't
think there is a one-size-fits-all Slurm installation solution.
Since requirements can be so different, and because Slurm is a fantastic
software that can be configured for many different scenarios, IMHO a
support contract with SchedMD is the best way to get consulting services,
get general help, and report bugs. We have excellent experiences with
SchedMD support (https://www.schedmd.com/support.php).
Best regards,
Ole
> On Thu, Apr 20, 2023 at 2:11 AM Ole Holm Nielsen
> <Ole.H.Nielsen at fysik.dtu.dk <mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
>
> Hi Hoot,
>
> On 4/20/23 00:15, Hoot Thompson wrote:
> > Is there a ‘how to’ or recipe document for setting up and enforcing
> resource limits? I can establish accounts, users, and set limits but
> 'current value' is not incrementing after running jobs.
>
> I have written about resource limits in this Wiki page:
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits>
More information about the slurm-users
mailing list