[slurm-users] [External] Re: PropagateResourceLimits

Thu Apr 29 17:22:03 UTC 2021

What I said in my last e-mail (which you probably haven't gotten to yet) 
is similar to this case. On it's own Slurm wouldn't propagate resource 
limits, but that as been added as a function. In your case, Slurm has 
functionality built into it where you can tell it to use PAM. With this 
functionality built into Slurm and enabled like you have done, Slurm 
would bypass PAM.

This is similar to SSH, where you can enable the UsePAM feature.

My reading of the documentation for PropagateResourceLimits, I think 
Slurm looks at the limits in the actual environment when the job is 
submitted, not in /etc/security/limits.conf via PAM. In my previous 
e-mail, I provided a method to test this, but haven't tested this 
myself. Yet.

Prentice

On 4/29/21 12:54 PM, Ryan Novosielski wrote:
> It may not for specifically PropagateResourceLimits – as I said, the docs are a little sparse on the “how” this actually works – but you’re not correct that PAM doesn’t come into play re: user jobs. If you have “UsePam = 1” set, and have an /etc/pam.d/slurm, as our site does, there is some amount of interaction here, and PAM definitely affects user jobs.
>
>> On Apr 27, 2021, at 11:31 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>
>> I don't think PAM comes into play here. Since Slurm is starting the processes on the compute nodes as the user, etc., PAM is being bypassed.
>>
>> Prentice
>>
>>
>> On 4/22/21 10:55 AM, Ryan Novosielski wrote:
>>> My recollection is that this parameter is talking about “ulimit” parameters, and doesn’t have to do with cgroups. The documentation is not as clear here as it could be, about what this does, the mechanism by which it’s applied (PAM module), etc.
>>>
>>> Sent from my iPhone
>>>
>>>> On Apr 22, 2021, at 09:07, Diego Zuccato <diego.zuccato at unibo.it> wrote:
>>>>
>>>> Hello all.
>>>>
>>>> I'd need a clarification about PropagateResourceLimits.
>>>> If I set it to NONE, will cgroup still limit the resources a job can use on the worker node(s), actually decoupling limits on the frontend from limits on the worker nodes?
>>>>
>>>> I've been bitten by the default being ALL, so when I tried to limit to 1GB soft / 4GB hard the memory users can use on the frontend, the jobs began to fail at startup even if they requested 200G (that are available on the worker nodes but not on the frontend)...
>>>>
>>>> Tks.
>>>>
>>>> -- 
>>>> Diego Zuccato
>>>> DIFA - Dip. di Fisica e Astronomia
>>>> Servizi Informatici
>>>> Alma Mater Studiorum - Università di Bologna
>>>> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>>>> tel.: +39 051 20 95786
>>>>