[slurm-users] Detecting non-MPI jobs running on multiple nodes

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 29 12:51:43 UTC 2022


Hi Loris,

On 9/29/22 09:26, Loris Bennett wrote:
> Has anyone already come up with a good way to identify non-MPI jobs which
> request multiple cores but don't restrict themselves to a single node,
> leaving cores idle on all but the first node?
> 
> I can see that this is potentially not easy, since an MPI job might have
> still have phases where only one core is actually being used.

Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an 
"unexpected" CPU load.  If you see the same JobID runing on multiple nodes 
with a too low CPU load, that might point to a job such as you describe.

/Ole

[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat



More information about the slurm-users mailing list