Dan,

The requirement for varying CPU and RAM requirements sounds like it could be met with the Heterogeneous Jobs feature (https://slurm.schedmd.com/heterogeneous_jobs.html) of Slurm. Take a look at that document and see if it meets your needs.

 

Mike Robbert

Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research Computing

Information and Technology Solutions (ITS)

303-273-3786 | mrobbert@mines.edu  

A close up of a sign

Description automatically generated

 

On 7/8/24, 14:20, "Dan Healy via slurm-users" <slurm-users@lists.schedmd.com> wrote:

 

CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.

 

Hi there,

 

I've received a question from an end user, which I presume the answer is "No", but would like to ask the community first. 

 

Scenario: The user wants to create a series of jobs that all need to start at the same time. Example: there are 10 different executable applications which have varying CPU and RAM constraints, all of which need to communicate via TCP/IP. Of course the user could design some type of idle/statusing mechanism to wait until all jobs are randomly started, then begin execution, but this feels like a waste of resources. The complete execution of these 10 applications would be considered a single simulation. The goal would be to distribute these 10 applications across the cluster and not necessarily require them all to execute on a single node.

 

Is there a good architecture for this using SLURM? If so, please kindly point me in the right direction.

 

--

Thanks,

Daniel Healy