[slurm-users] Resource LImits

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Apr 21 08:43:44 UTC 2023


Hi Jason,

On 4/20/23 20:11, Jason Simms wrote:
> Hello Ole and Hoot,
> 
> First, Hoot, thank you for your question. I've managed Slurm for a few 
> years now and still feel like I don't have a great understanding about 
> managing or limiting resources.
> 
> Ole, thanks for your continued support of the user community with your 
> documentation. I do wish not only that more of your information were 
> contained within the official docs, but also that there were even clearer 
> discussions around certain topics.
> 
> As an example, you write that "It is important to configure slurm.conf so 
> that the locked memory limit isn’t propagated to the batch jobs" by 
> setting PropagateResourceLimitsExcept=MEMLOCK. It's unclear to me whether 
> you are suggesting that literally everyone should have that set, or 
> whether it only applies to certain configurations. We don't have it set, 
> for instance, but we've not run into trouble with jobs failing due to 
> locked memory errors.

The link mentioned in the page hopefully explains it: 
https://slurm.schedmd.com/faq.html#memlock

> Then, in the official docs, to which you link, it says that "it may also 
> be desirable to lock the slurmd daemon's memory to help ensure that it 
> keeps responding if memory swapping begins" by creating 
> /etc/sysconfig/slurm containing the line SLURMD_OPTIONS="-M". Would there 
> ever be a reason *not* to include that? That is, I can't think it would 
> ever be desirable for slurmd to stop responding. So is that another 
> "universal" recommendation, I wonder?

I'm not an expert on locking slurmd pages!  The -M option is documented in 
the slurmd manual page, and I probably read a thread long ago abut this on 
the slurm-users mailing list discussing this.  You could try it out in 
your environment and see if all is well.

> It may be me talking as a new-ish user, but I would find a concise 
> document laying out common or useful configuration options to be presented 
> when setting up or reconfiguring Slurm. I'm certain I have inefficient or 
> missing options that I should have.

IMHO, most sites have their own requirements and preferences, so I don't 
think there is a one-size-fits-all Slurm installation solution.

Since requirements can be so different, and because Slurm is a fantastic 
software that can be configured for many different scenarios, IMHO a 
support contract with SchedMD is the best way to get consulting services, 
get general help, and report bugs.  We have excellent experiences with 
SchedMD support (https://www.schedmd.com/support.php).

Best regards,
Ole

> On Thu, Apr 20, 2023 at 2:11 AM Ole Holm Nielsen 
> <Ole.H.Nielsen at fysik.dtu.dk <mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
> 
>     Hi Hoot,
> 
>     On 4/20/23 00:15, Hoot Thompson wrote:
>      > Is there a ‘how to’ or recipe document for setting up and enforcing
>     resource limits? I can establish accounts, users, and set limits but
>     'current value' is not incrementing after running jobs.
> 
>     I have written about resource limits in this Wiki page:
>     https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits>



More information about the slurm-users mailing list