[slurm-users] Detecting non-MPI jobs running on multiple nodes
Loris Bennett
loris.bennett at fu-berlin.de
Thu Sep 29 13:40:59 UTC 2022
Hi Ole,
Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
> Hi Loris,
>
> On 9/29/22 09:26, Loris Bennett wrote:
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restrict themselves to a single node,
>> leaving cores idle on all but the first node?
>> I can see that this is potentially not easy, since an MPI job might have
>> still have phases where only one core is actually being used.
>
> Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an
> "unexpected" CPU load. If you see the same JobID runing on multiple nodes
> with a too low CPU load, that might point to a job such as you describe.
>
> /Ole
>
> [1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
I do already use 'pestat -F' although this flags over 100 of our 170
nodes, so it results in a bit of information overload. I guess it would
be nice if the sensitivity of the flagging could be tweaked on the
command line, so that only the worst nodes are shown.
I also use some wrappers around 'sueff' from
https://github.com/ubccr/stubl
to generate part of an ASCII dashboard (an dasciiboard?), which looks
like
Username Mem_Request Max_Mem_Use CPU_Efficiency Number_of_CPUs_In_Use
alpha 42000M 0.03Gn 48.80% (0.98 of 2)
beta 10500M 11.01Gn 99.55% (3.98 of 4)
gamma 8000M 8.39Gn 99.64% (63.77 of 64)
...
chi varied 3.96Gn 83.65% (248.44 of 297)
phi 1800M 1.01Gn 98.79% (248.95 of 252)
omega 16G 4.61Gn 99.69% (127.60 of 128)
== Above data from: Thu 29 Sep 15:26:29 CEST 2022 =============================
and just loops every 30 seconds. This is what I use to spot users with
badly configured jobs.
However, I'd really like to be able to identify non-MPI jobs on multiple
nodes automatically.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list