[slurm-users] Detecting non-MPI jobs running on multiple nodes

Loris Bennett loris.bennett at fu-berlin.de
Fri Sep 30 05:51:41 UTC 2022


"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)"
<noam.bernstein at nrl.navy.mil> writes:

>  On Sep 29, 2022, at 10:34 AM, Steffen Grunewald <steffen.grunewald at aei.mpg.de> wrote:
>
>  Hi Noam,
>
>  I'm wondering why one would want to know that - given that there are
>  approaches to multi-node operation beyond MPI (Charm++ comes to mind)?
>
> The thread title requested a way of detecting non-MPI jobs running on multiple nodes.  I assumed that the requester knows, maybe based on their users' software, that there are no legitimate ways for them to run on multiple nodes without MPI.
> Actually, we have users that run embarrassingly parallel jobs which just ssh to the other nodes and gather files, so clearly it can be done in a useful way with very low-tech approaches, but that's a n oddball (and just plain old) software package.

There may indeed be legitimate ways for non-MPI jobs to be running on
multiple nodes, but that's a bit of an edge case.  However, such cases
would be fine, as long as the resources requested are being used
efficiently.  Thus, Ward's suggestion about checking for cgroups seems
the most general solution.  Having said that, it would also be useful to
then check the head node for 'mpirun' or similar.

Cheers,

Loris



More information about the slurm-users mailing list