[slurm-users] running mpi from inside an mpi job
David Schanzenbach
davidls at hawaii.edu
Tue Jun 20 20:53:32 UTC 2023
Hi Mike,
What version of Slurm are you using?
If you are running a version of Slurm 20.11.x or newer, a change in the
scheduler behavior was made so that by default srun will not allow
resources to be overlapped by job steps.
https://bugs.schedmd.com/show_bug.cgi?id=11863#c3
I would see if adding the --overlap flag to the srun call for the parent
mpi process fixes the problem.
https://slurm.schedmd.com/srun.html#OPT_overlap
Thanks,
David
On 6/20/2023 4:08 AM, Vanhorn, Mike wrote:
> I have a user who is submitting a job to slurm which requests 16 tasks, i.e.
>
> #SBATCH --ntasks 16
> #SBATCH –cpus-per-task 1
>
> The slurm script runs an mpi program called Parent.mpi, which then (fails to) call 15 mpi child processes. He’s tried two different ways for the parent to spawn the children:
>
>
> 1. A system() call, such as system(“srun --ntasks=4 mpirun -np 4 ./child.mpi”) or system(“mpirun -np 4 ./child.mpi”)
>
>
> 1. MPI_Comm_Spawn
>
>
> Both ways generate the following in the slurm output file:
>
> srun: Job ### step creation temporarily disabled, retrying (Requested nodes are busy)
> srun: error: Unable to create step for job ###: Job/step already completing or completed
>
> So, basically, he’s requesting 16 tasks, one of which is used by the parent and the other 15 are supposed to get used by the children, but the children can’t use the other 16 because...well, I’m not sure why.
>
> Is there something I need to change in the slurm.conf to allow this to work?
>
> ---
> Mike VanHorn
> Senior Computer Systems Administrator
> College of Engineering and Computer Science
> Wright State University
> 265 Russ Engineering Center
> 937-775-5157
> michael.vanhorn at wright.edu
More information about the slurm-users
mailing list