Overhead of multiple concurrent job steps - slurm-users

15 Mar 2024


      Hello,
We have a use case in which we need to launch multiple concurrently running MPI applications inside a job allocation. Most supercomputing facilities limit the number of concurrent job steps as they incur an overhead with the global Slurm scheduler. Some frameworks, such as the Flux framework from LLNL, claim to mitigate this issue by starting an instance of their own scheduler inside an allocation, which then acts as the resource manager for the compute nodes in the allocation.
Out of curiosity, I was wondering if there is a fundamental reason behind having a single global scheduler that the srun launch commands must contact to launch job steps. Perhaps it was overkill to develop a ‘hierarhical’ design in which Slurm launches a local job daemon for every allocation that manages resources for that allocation? I would appreciate your insight in understanding more about Slurm’s core design.
Thanks and regards,
Kshitij Mehta
Oak Ridge National Laboratory