[slurm-users] Detecting non-MPI jobs running on multiple nodes
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 29 12:51:43 UTC 2022
Hi Loris,
On 9/29/22 09:26, Loris Bennett wrote:
> Has anyone already come up with a good way to identify non-MPI jobs which
> request multiple cores but don't restrict themselves to a single node,
> leaving cores idle on all but the first node?
>
> I can see that this is potentially not easy, since an MPI job might have
> still have phases where only one core is actually being used.
Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an
"unexpected" CPU load. If you see the same JobID runing on multiple nodes
with a too low CPU load, that might point to a job such as you describe.
/Ole
[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
More information about the slurm-users
mailing list