[slurm-users] srun jobfarming hassle question
Ohlerich, Martin
Martin.Ohlerich at lrz.de
Wed Jan 18 13:39:30 UTC 2023
Hello Björn-Helge.
Sigh ...
First of all, of course, many thanks! This indeed helped a lot!
Two comments:
a) Why are Interfaces at Slurm tools changed? I once learned that the Interfaces must be designed to be as stable as possible. Otherwise, users get frustrated and go away.
b) This only works if I have to specify --mem for a task. Although manageable, I wonder why one needs to be that restrictive. In principle, in the use case outlined, one task could use a bit less memory, and the other may require a bit more the half of the node's available memory. (So clearly this isn't always predictable.) I only hope that in such cases the second task does not die from OOM ... (I will know soon, I guess.)
Really, thank you! Was a very helpful hint!
Cheers, Martin
________________________________
Von: slurm-users <slurm-users-bounces at lists.schedmd.com> im Auftrag von Bjørn-Helge Mevik <b.h.mevik at usit.uio.no>
Gesendet: Mittwoch, 18. Januar 2023 13:49
An: slurm-users at schedmd.com
Betreff: Re: [slurm-users] srun jobfarming hassle question
"Ohlerich, Martin" <Martin.Ohlerich at lrz.de> writes:
> Dear Colleagues,
>
>
> already for quite some years now are we again and again facing issues on our clusters with so-called job-farming (or task-farming) concepts in Slurm jobs using srun. And it bothers me that we can hardly help users with requests in this regard.
>
>
> From the documentation (https://slurm.schedmd.com/srun.html#SECTION_EXAMPLES), it reads like this.
>
> ------------------------------------------->
>
> ...
>
> #SBATCH --nodes=??
>
> ...
>
> srun -N 1 -n 2 ... prog1 &> log.1 &
>
> srun -N 1 -n 1 ... prog2 &> log.2 &
Unfortunately, that part of the documentation is not quite up-to-date.
The semantics of srun has changed a little the last couple of
years/Slurm versions, so today, you have to use "srun --exact ...". From
"man srun" (version 21.08):
--exact
Allow a step access to only the resources requested for the
step. By default, all non-GRES resources on each node in
the step allocation will be used. This option only applies
to step allocations.
NOTE: Parallel steps will either be blocked or rejected
until requested step resources are available unless --over‐
lap is specified. Job resources can be held after the com‐
pletion of an srun command while Slurm does job cleanup.
Step epilogs and/or SPANK plugins can further delay the
release of step resources.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230118/ff7a96e7/attachment-0003.htm>
More information about the slurm-users
mailing list