[slurm-users] how can users start their worker daemons using srun?
Chris Samuel
chris at csamuel.org
Tue Aug 28 06:35:18 MDT 2018
On Tuesday, 28 August 2018 10:21:45 AM AEST Chris Samuel wrote:
> That won't happen on a well configured Slurm system as it is Slurm's role to
> clear up any processes from that job left around once that job exits.
Sorry Reid, for some reason I misunderstood your email and the fact you were
talking about job steps! :-(
One other option in this case is that you can say add 2 cores per node for the
daemons to the overall job request and then do in your jobs
srun --ntasks-per-node=1 -c 2 ./foo.py &
and ensure that foo.py doesn't exit after the daemons launch (if you are using
cgroups then those daemons should be contained within the job steps cgroup so
you should be able to spot their PIDs easily enough).
That then gives you the rest of the cores to play with, so you would launch
future job steps on n-2 cores per node (you could use the environment
variables SLURM_CPUS_PER_TASK & SLURM_NTASKS_PER_NODE to avoid having to hard
code these for instance).
Of course at the end then your batch script would need to kill off that first
job step.
Would that help?
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list