[slurm-users] Job completed but child process still running

Chris Samuel chris at csamuel.org
Mon Jan 13 15:30:58 UTC 2020


On 1/13/20 5:55 am, Youssef Eldakar wrote:

> In an sbatch script, a user calls a shell script that starts a Java 
> background process. The job immediately is completed, but the child Java 
> process is still running on the compute node.
> 
> Is there a way to prevent this from happening?

What I would recommend is to use Slurm's cgroups support so that 
processes that put themselves into the background this way are tracked 
as part of the job and cleaned up when the job exits.

https://slurm.schedmd.com/cgroups.html

Depending on how the Java process puts itself into the background you 
could try adding a "wait" command at the end of the shell script so that 
it doesn't exit immediately (it's not guaranteed though).

With cgroups the Slurm script could also check the processes in your 
cgroup to monitor the existence of the Java process, sleeping for a 
while between checks, and exit when it's no longer found.  For instance 
once you've got the PID of the Java process you can use "kill -0 $PID" 
to check if it's still there (rather than using ps).

All the best,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list