wagner at itc.rwth-aachen.de
Thu Mar 23 11:58:23 UTC 2023
no, we don't use --propagate.
in slurm.conf, we set
That in fact means, that we really do not propagate any limits besides
the coresize (excerpt from slurm.conf manpage):
> If neither PropagateResourceLimits or PropagateResourceLimitsExcept
> are configured and the "--propagate" option is not specified, then
> the default action is to propagate all limits.
So, the maximum number of processes should not be propagated from the
submit nodes to the batch nodes. Moreover, I do not know where that high
limit might come from.
In /etc/security/limits.conf we set
* soft nproc 262144
ulimit -u gives me 16384 on the submit nodes.
the batchjobs are still working as expected, but that "error"-message is
Am 23.03.2023 um 10:01 schrieb Hermann Schwärzler:
> Hi Marcus,
> I am not sure if this is helpful but from looking at the source code
> of Slurm (line 276 of src/slurmd/slurmstepd/ulimits.c in version
> 22.05) it looks like you are explicitly using
> to set resource limits (the one you see when running
> "ulimit -a") on the workers the same as on the submit host.
> The error "Invalid argument" is returned when Slurm wants to set the
> hard limit lower than the (default?) soft limit (in this particular
> case for the maximum number of processes
> ("ulimit -u")).
> Maybe your hard limit for that on the submit host is configured to be
> lower than it is on the worker nodes; Slurm gets this error and shows
> it to you as you were using the --propagate option?
> On 3/23/23 08:00, Wagner, Marcus wrote:
>> Hi Folks,
>> has anyone ever stumbled upon such an error:
>> slurmstepd: error: Can't propagate RLIMIT_NPROC of 767202 from submit
>> host: Invalid argument
>> Anyone knows, where that comes from?
>> Any hints are welcome.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 5326 bytes
Desc: S/MIME Cryptographic Signature
More information about the slurm-users