[slurm-users] Detecting non-MPI jobs running on multiple nodes
Loris Bennett
loris.bennett at fu-berlin.de
Thu Sep 29 13:21:18 UTC 2022
Hi Davide,
That is a interesting idea. We already do some averaging, but over the
whole of the past month. For each user we use the output of seff to
generate two scatterplots: CPU-efficiency vs. CPU-hours and
memory-efficiency vs. GB-hours. See
https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik
However, I am mainly interested in being able to cancel some of the inefficient
jobs before they have run for too long.
Cheers,
Loris
Davide DelVento <davide.quantum at gmail.com> writes:
> At my previous job there were cron jobs running everywhere measuring
> possibly idle cores which were eventually averaged out for the
> duration of the job, and reported (the day after) via email to the
> user support team.
> I believe they stopped doing so when compute became (relatively) cheap
> at the expense of memory and I/O becoming expensive.
>
> I know, it does not help you much, but perhaps something to think about
>
> On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett
> <loris.bennett at fu-berlin.de> wrote:
>>
>> Hi,
>>
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restrict themselves to a single node,
>> leaving cores idle on all but the first node?
>>
>> I can see that this is potentially not easy, since an MPI job might have
>> still have phases where only one core is actually being used.
>>
>> Cheers,
>>
>> Loris
>>
>> --
>> Dr. Loris Bennett (Herr/Mr)
>> ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
>>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list