[slurm-users] how can users start their worker daemons using srun?

Priedhorsky, Reid reidpr at lanl.gov
Tue Aug 28 17:10:34 MDT 2018


> On Aug 28, 2018, at 6:35 AM, Chris Samuel <chris at csamuel.org> wrote:
> 
> On Tuesday, 28 August 2018 10:21:45 AM AEST Chris Samuel wrote:
> 
>> That won't happen on a well configured Slurm system as it is Slurm's role to
>> clear up any processes from that job left around once that job exits.
> 
> Sorry Reid, for some reason I misunderstood your email and the fact you were 
> talking about job steps! :-(
> 
> One other option in this case is that you can say add 2 cores per node for the 
> daemons to the overall job request and then do in your jobs
> 
> srun --ntasks-per-node=1 -c 2 ./foo.py &

Thanks Chris.

I tried the following:

  $ srun --ntasks-per-node=1 -c1 -- sleep 15 &
  [1] 180948
  $ srun --ntasks-per-node=1 -c1 -- hostname
  srun: Job step creation temporarily disabled, retrying
  srun: Job step created
  cn001.localdomain
  [1]+  Done                    srun --ntasks-per-node=1 -c1 -- sleep 15

and the second srun still waits until the first is complete.

This is surprising to me, as my interpretation is that the first run should allocate only one CPU, leaving 35 for the second srun, which also only needs one CPU and need not wait.

Is this behavior expected?
Am I missing something?

Thanks,
Reid




More information about the slurm-users mailing list