Hi,

 

The problem comes from if the login nodes (or submission hosts) have different ulimits – maybe the submission hosts are VMs and not physical servers.  Then the ulimits will be passed from submission hosts in Slurm to the jobs compute node by default which can results in different settings being applied.  If the login nodes have the same ulimit settings then you may not see a difference.

 

We happened to see a difference due to moving to a virtualised login node infrastructure which has slightly different settings applied.

 

Does that make sense?

 

I also missed that setting in slurm.conf so good to know it is possible to change the default behaviour.


Tom

 

From: Patryk Be³zak via slurm-users <slurm-users@lists.schedmd.com>
Date: Friday, 17 May 2024 at 10:15
To: Dj Merrill <slurm@deej.net>
Cc: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: srun weirdness

External email to Cardiff University - Take care when replying/opening attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.



Hi,

I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf:

```
*       soft    memlock 4086160 #Allow more Memory Locks for MPI
*       hard    memlock 4086160 #Allow more Memory Locks for MPI
*       soft    nofile  1048576 #Increase the Number of File Descriptors
*       hard    nofile  1048576 #Increase the Number of File Descriptors
*       soft    stack   unlimited       #Set soft to hard limit
*       soft    core    4194304 #Allow Core Files
```

and it sets up all limits we want without any problems, and there is no need to pass extra arguments to slurm commands or modify the config file.

Regards,
Patryk.

On 24/05/15 02:26, Dj Merrill via slurm-users wrote:
[-- Type: text/plain; charset=US-ASCII, Encoding: 7bit, Size: 0,2K --]
> I completely missed that, thank you!
>
> -Dj
>
>
> Laura Hild via slurm-users wrote:
> > PropagateResourceLimitsExcept won't do it?
> Sarlo, Jeffrey S wrote:
> > You might look at the PropagateResourceLimits and PropagateResourceLimitsExcept settings in slurm.conf

[-- Alternative Type #1: text/html; charset=UTF-8, Encoding: 8bit, Size: 1,0K --]

>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com