[slurm-users] Reserving slots with sbatch and OpenMpi

Ralph Castain rhc at pmix.org
Mon Mar 14 14:25:42 UTC 2022


Yes and no. mpirun does pick up the basic allocation. However, it does not pick up the details of proc layout from there - you need to put it on the mpirun cmd line using the options it understands.

Simple reason: Slurm tends to change its envars at will and without warning (which is their right), so there is no way we can rely on them for such directions. We have enough trouble just trying to keep up with the changes to ensure we get the basic allocation (i.e., which nodes we are supposed to use)


> On Mar 14, 2022, at 4:17 AM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov> wrote:
> 
> Yes, sorry.  It is 
>  
> mpirun -wdir "."  ./parent
>  
> I expected mpirun to pick up the job parameters from the SLURM_* environment variables created by sbatch.
>  
> Thanks,
> Kurt
>  
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Ralph Castain
> Sent: Friday, March 11, 2022 3:48 PM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: [EXTERNAL] Re: [slurm-users] Reserving slots with sbatch and OpenMpi
>  
> I assume you are running the job via mpirun? Can you share the mpirun cmd line?
>  
> 
> 
> On Mar 11, 2022, at 11:31 AM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov <mailto:kurt.e.mccall at nasa.gov>> wrote:
>  
> With sbatch, what is the proper way to launch 5 tasks each on a single node, but reserve two slots on each node so that the original tasks can each create one new process using MPI_Comm_spawn?
>  
> I’ve tried various combinations of the sbatch arguments –nodes, --ntasks-per-node and –cpu-per-node, but all attempts result in this OpenMpi error message:
>  
> “All nodes which are allocated for this job are already filled.”
>  
> I expected the proper arguments to be –nodes=5  --ntasks=5   –cpus-per-task=2.
>  
> The 5 original processes are created correctly, but it seems like MPI_Comm_spawn is causing the error message when it tries to allocate a CPU.
>  
> I’m using slurm 20.11.8 and OpenMpi 4.1.2.
>  
> Thanks,
> Kurt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220314/8da4f4e2/attachment.htm>


More information about the slurm-users mailing list