[slurm-users] Running an MPI job across two partitions

Mon Mar 23 17:23:06 UTC 2020

Others might have more ideas, but anything I can think of would require a lot of manual steps to avoid mutual interference with jobs in the other partitions (allocating resources for a dummy job in the other partition, modifying the MPI host list to include nodes in the other partition, etc.).

So why not make another partition encompassing both sets of nodes?

> On Mar 23, 2020, at 10:58 AM, CB <cbalways at gmail.com> wrote:
> 
> Hi Andy,
> 
> Yes, they are on teh same network fabric.
> 
> Sure, creating another partition that encompass all of the nodes of the two or more partitions would solve the problem.
> I am wondering if there are any other ways instead of creating a new partition?
> 
> Thanks,
> Chansup
> 
> 
> On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy <andy.riebs at hpe.com> wrote:
> When you say “distinct compute nodes,” are they at least on the same network fabric?
> 
>  
> 
> If so, the first thing I’d try would be to create a new partition that encompasses all of the nodes of the other two partitions.
> 
>  
> 
> Andy
> 
>  
> 
> From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of CB
> Sent: Monday, March 23, 2020 11:32 AM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: [slurm-users] Running an MPI job across two partitions
> 
>  
> 
> Hi,
> 
>  
> 
> I'm running Slurm 19.05 version.
> 
>  
> 
> Is there any way to launch an MPI job on a group of distributed  nodes from two or more partitions, where each partition has distinct compute nodes?
> 
>  
> 
> I've looked at the heterogeneous job support but it creates two-separate jobs.
> 
>  
> 
> If there is no such capability with the current Slurm, I'd like to hear any recommendations or suggestions.
> 
>  
> 
> Thanks,
> 
> Chansup
>