[slurm-users] MPI Jobs OOM-killed which weren't pre-21.08.5
Paul Edmon
pedmon at cfa.harvard.edu
Thu Feb 10 14:29:04 UTC 2022
We also noticed the same thing with 21.08.5. In the 21.08 series
SchedMD changed the way they handle cgroups to set the stage for cgroups
v2 (see: https://slurm.schedmd.com/SLUG21/Roadmap.pdf). The 21.08.5
introduced a bug fix which then caused mpirun to not pin properly
(particularly for older versions of MPI):
https://github.com/SchedMD/slurm/blob/slurm-21-08-5-1/NEWS What we've
recommended to users who have hit this was to swap over to using srun
instead of mpirun and the situation clears up.
-Paul Edmon-
On 2/10/2022 8:59 AM, Ward Poelmans wrote:
> Hi Paul,
>
> On 10/02/2022 14:33, Paul Brunk wrote:
>>
>> Now we see a problem in which the OOM killer is in some cases
>>
>> predictably killing job steps who don't seem to deserve it. In some
>>
>> cases these are job scripts and input files which ran fine before our
>>
>> Slurm upgrade. More details follow, but that's it the issue in a
>>
>> nutshell.
>>
> I'm not sure if this is the case but it might help to keep in mind the
> difference between mpirun and srun.
>
> With srun you let slurm create tasks with the appropriate mem/cpu etc
> limits and the mpi ranks will run directly in a task.
>
> With mpirun you usually let your MPI distribution start on task per
> node which will spawn the mpi manager which will start the actual mpi
> program.
>
> You might very well end up with different memory limits per process
> which could be the cause of your OOM issue. Especially if not all MPI
> ranks use the same amount of memory.
>
> Ward
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220210/d8543be5/attachment.htm>
More information about the slurm-users
mailing list