Hi everyone,
I saw similar discussions in the archives, however I could not see a clear explanation therefore decided to open up a new thread.
The problem I have is that default ulimits defined in the compute node OS image are overridden by Slurm and somehow changes in slurm.conf does not resolve the problem.
This is specifically about max memory size, but the same issue applies to stack size and user processes as well.
From the head node or from one of the compute nodes if I bypass slurm: max memory size (kbytes, -m) unlimited
But going through salloc / sbatch it becomes: max memory size (kbytes, -m) 1024
As an initial attempt I tried setting `PropagateResourceLimitsExcept=NONE` but it didn't help. Then I tried `PropagateResourceLimits=ALL` also with no luck.
So in theory PropagateResourceLimits should propagate all limits, but I'm not sure if that's really the case.
I'm open to suggestions,
Fatih Ertinaz
Note: Slurm version is 24.05.1 OS is SLES SP5
Hi Fatih,
There are some discussions of limits in these Wiki pages, maybe you'll find it useful?
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#job-limits https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#slurmd-system...
/Ole
On 1/30/25 21:31, Fatih Ertinaz via slurm-users wrote:
Hi everyone,
I saw similar discussions in the archives, however I could not see a clear explanation therefore decided to open up a new thread.
The problem I have is that default ulimits defined in the compute node OS image are overridden by Slurm and somehow changes in slurm.conf does not resolve the problem.
This is specifically about max memory size, but the same issue applies to stack size and user processes as well.
From the head node or from one of the compute nodes if I bypass slurm: max memory size (kbytes, -m) unlimited
But going through salloc / sbatch it becomes: max memory size (kbytes, -m) 1024
As an initial attempt I tried setting `PropagateResourceLimitsExcept=NONE` but it didn't help. Then I tried `PropagateResourceLimits=ALL` also with no luck.
So in theory PropagateResourceLimits should propagate all limits, but I'm not sure if that's really the case.
I'm open to suggestions,
Fatih Ertinaz
Note: Slurm version is 24.05.1 OS is SLES SP5