[slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

Guillaume De Nayer denayer at hsu-hh.de
Wed Jun 15 15:36:44 UTC 2022


On 06/15/2022 05:25 PM, Ward Poelmans wrote:
> Hi Guillaume,
> 
> On 15/06/2022 16:59, Guillaume De Nayer wrote:
>>
>> Perhaps I missunderstand the Slurm documentation...
>>
>> As thought that the --exclusive option used in combination with sbatch
>> will reserve the whole node (40 cores) for the job (submitted with
>> sbatch). This part is working fine. I can check it with sacct.
>>
>> Then, this job starts subtasks on the reserved 40 cores with srun.
>> Therefore I'm using "-n1 -c1" in combination with "srun". I thought that
>> it was possible to use the reserved cores inside this job using srun.
> 
> You're correct. --exclusive will give you all cores on the nodes but
> only as much memory as requested.
> 
>  
>> The following slightly modified job without --exclusive and with
>> --ntasks=2 leads to a similar problem: Only one srun is running at a
>> time. The second starts directly after the first one finished.
>>
>> #!/bin/bash
>> #SBATCH --job-name=test_multi_prog_srun
>> #SBATCH --ntasks=2
>> #SBATCH --partition=short
>> #SBATCH --time=02:00:00
>>
>> srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 &
>> srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 &
>> wait
> 
> This should work... It works on our cluster. Are you sure they don't run
> in parallel?
> 

Yes I'm pretty sure that it does not work in parallel: The command sacct
show me only on subtask "RUNNING". Then, when this subtask is marked as
"COMPLETED", the second one appears and is marked "RUNNING".

Moreover, if I directly connect on the node, only one process of "sleep"
is running.

ok. If it works on your cluster, I have perhaps a problem in my slurm
config. Which version of slurm are you using on your cluster? And can
you share your slurm.conf?

> We usually recommend to use gnu parallel or xargs like:
> 
> xargs -P $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact sleep 30
> 

ok. I will install "gnu parallel" and also test your xargs command.

Thx a lot!
Guillaume




More information about the slurm-users mailing list