Hi there,
I've received a question from an end user, which I presume the answer is
"No", but would like to ask the community first.
Scenario: The user wants to create a series of jobs that all need to start
at the same time. Example: there are 10 different executable applications
which have varying CPU and RAM constraints, all of which need to
communicate via TCP/IP. Of course the user could design some type of
idle/statusing mechanism to wait until all jobs are *randomly *started,
then begin execution, but this feels like a waste of resources. The
complete execution of these 10 applications would be considered a single
simulation. The goal would be to distribute these 10 applications across
the cluster and not necessarily require them all to execute on a single
node.
Is there a good architecture for this using SLURM? If so, please kindly
point me in the right direction.
--
Thanks,
Daniel Healy