We do have diferent limits on submit host, and I believe that until we put `limits.d/99-cluster.conf` file the limits were passed to jobs, but can't tell for sure, it was long time ago. Still, modyfying the `limits.d` on cluster nodes may be a different approach and solution to formentioned issue.
I wonder if anyone has an opinion which way is better and why - whether to modify the slurmctld.conf or node system limits.
Patryk.
On 24/05/17 09:30, greent10--- via slurm-users wrote: [-- Type: text/plain; charset=windows-1250, Encoding: quoted-printable, Size: 2,5K --]
Hi,
The problem comes from if the login nodes (or submission hosts) have different ulimits – maybe the submission hosts are VMs and not physical servers. Then the ulimits will be passed from submission hosts in Slurm to the jobs compute node by default which can results in different settings being applied. If the login nodes have the same ulimit settings then you may not see a difference.
We happened to see a difference due to moving to a virtualised login node infrastructure which has slightly different settings applied.
Does that make sense?
I also missed that setting in slurm.conf so good to know it is possible to change the default behaviour.
Tom
From: Patryk Bełzak via slurm-users slurm-users@lists.schedmd.com Date: Friday, 17 May 2024 at 10:15 To: Dj Merrill slurm@deej.net Cc: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: srun weirdness External email to Cardiff University - Take care when replying/opening attachments or links. Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.
Hi,
I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf:
* soft memlock 4086160 #Allow more Memory Locks for MPI * hard memlock 4086160 #Allow more Memory Locks for MPI * soft nofile 1048576 #Increase the Number of File Descriptors * hard nofile 1048576 #Increase the Number of File Descriptors * soft stack unlimited #Set soft to hard limit * soft core 4194304 #Allow Core Files
and it sets up all limits we want without any problems, and there is no need to pass extra arguments to slurm commands or modify the config file.
Regards, Patryk.
On 24/05/15 02:26, Dj Merrill via slurm-users wrote: [-- Type: text/plain; charset=US-ASCII, Encoding: 7bit, Size: 0,2K --]
I completely missed that, thank you!
-Dj
Laura Hild via slurm-users wrote:
PropagateResourceLimitsExcept won't do it?
Sarlo, Jeffrey S wrote:
You might look at the PropagateResourceLimits and PropagateResourceLimitsExcept settings in slurm.conf
[-- Alternative Type #1: text/html; charset=UTF-8, Encoding: 8bit, Size: 1,0K --]
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
[-- Alternative Type #1: text/html; charset=windows-1250, Encoding: quoted-printable, Size: 5,8K --]
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com