[slurm-users] srun jobfarming hassle question
Bjørn-Helge Mevik
b.h.mevik at usit.uio.no
Thu Jan 19 07:23:22 UTC 2023
"Ohlerich, Martin" <Martin.Ohlerich at lrz.de> writes:
> Hello Björn-Helge.
>
>
> Sigh ...
>
> First of all, of course, many thanks! This indeed helped a lot!
Good!
> b) This only works if I have to specify --mem for a task. Although
> manageable, I wonder why one needs to be that restrictive. In
> principle, in the use case outlined, one task could use a bit less
> memory, and the other may require a bit more the half of the node's
> available memory. (So clearly this isn't always predictable.) I only
> hope that in such cases the second task does not die from OOM ... (I
> will know soon, I guess.)
As I understand it, Slurm (at least cgroups) will only kill a step if it
uses more memory *in total* on a node than the job got allocated to the
node. So if a job has 10 GiB allocated on a node, and a step runs two
tasks there, one task could use 9 GiB and the other 1 GiB without the
step being killed.
You can inspect the memory limits that are in effect in cgroups (v1) in
/sys/fs/cgroup/memory/slurm/uid_<uid>/job_<jobid> (usual location, at
least).
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230119/aa197958/attachment-0001.sig>
More information about the slurm-users
mailing list