[slurm-users] Job completed but child process still running
Chris Samuel
chris at csamuel.org
Mon Jan 13 15:30:58 UTC 2020
On 1/13/20 5:55 am, Youssef Eldakar wrote:
> In an sbatch script, a user calls a shell script that starts a Java
> background process. The job immediately is completed, but the child Java
> process is still running on the compute node.
>
> Is there a way to prevent this from happening?
What I would recommend is to use Slurm's cgroups support so that
processes that put themselves into the background this way are tracked
as part of the job and cleaned up when the job exits.
https://slurm.schedmd.com/cgroups.html
Depending on how the Java process puts itself into the background you
could try adding a "wait" command at the end of the shell script so that
it doesn't exit immediately (it's not guaranteed though).
With cgroups the Slurm script could also check the processes in your
cgroup to monitor the existence of the Java process, sleeping for a
while between checks, and exit when it's no longer found. For instance
once you've got the PID of the Java process you can use "kill -0 $PID"
to check if it's still there (rather than using ps).
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list