[slurm-users] Reserving slots with sbatch and OpenMpi

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Mon Mar 14 11:17:58 UTC 2022


Yes, sorry.  It is

mpirun -wdir "."  ./parent

I expected mpirun to pick up the job parameters from the SLURM_* environment variables created by sbatch.

Thanks,
Kurt

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Ralph Castain
Sent: Friday, March 11, 2022 3:48 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [EXTERNAL] Re: [slurm-users] Reserving slots with sbatch and OpenMpi

I assume you are running the job via mpirun? Can you share the mpirun cmd line?



On Mar 11, 2022, at 11:31 AM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>> wrote:

With sbatch, what is the proper way to launch 5 tasks each on a single node, but reserve two slots on each node so that the original tasks can each create one new process using MPI_Comm_spawn?

I’ve tried various combinations of the sbatch arguments –nodes, --ntasks-per-node and –cpu-per-node, but all attempts result in this OpenMpi error message:

“All nodes which are allocated for this job are already filled.”

I expected the proper arguments to be –nodes=5  --ntasks=5   –cpus-per-task=2.

The 5 original processes are created correctly, but it seems like MPI_Comm_spawn is causing the error message when it tries to allocate a CPU.

I’m using slurm 20.11.8 and OpenMpi 4.1.2.

Thanks,
Kurt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220314/3d03e4e9/attachment-0001.htm>


More information about the slurm-users mailing list