I think the best way to do it would be to schedule the 10 things to be a single slurm job and then use some of the various MPMD ways (the nitty gritty details depend if each executable is serial, OpenMP, MPI or hybrid).

On Mon, Jul 8, 2024 at 2:20 PM Dan Healy via slurm-users <slurm-users@lists.schedmd.com> wrote:
Hi there,

I've received a question from an end user, which I presume the answer is "No", but would like to ask the community first. 

Scenario: The user wants to create a series of jobs that all need to start at the same time. Example: there are 10 different executable applications which have varying CPU and RAM constraints, all of which need to communicate via TCP/IP. Of course the user could design some type of idle/statusing mechanism to wait until all jobs are randomly started, then begin execution, but this feels like a waste of resources. The complete execution of these 10 applications would be considered a single simulation. The goal would be to distribute these 10 applications across the cluster and not necessarily require them all to execute on a single node.

Is there a good architecture for this using SLURM? If so, please kindly point me in the right direction.

--
Thanks,

Daniel Healy

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com