[slurm-users] Using "srun" on compute nodes -- Ray cluster

Ryan Novosielski novosirj at rutgers.edu
Fri Jul 15 16:22:04 UTC 2022

Are you talking about a script that is run via sbatch containing srun 
command lines? If so, there are a lot of reasons to do that. One is 
better instrumentation, as I understand it, but also srun --mpi is a way 
to eliminate mpiexec/mpirun/etc., and is what we recommend at our site 
instead (using the PMI2 or PMIx methods).

On 7/15/22 05:17, Kamil Wilczek wrote:
> Dear Slurm Users,
> one of my cluster users would like to run a Ray cluster on Slurm.
> I noticed that the batch script example requires running the "srun"
> command on a compute node, which already is allocated:
> https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template 
> This is the first time I see or hear about this type of usage
> and I have problems wrapping my head around this.
> Is there anything wrong or unusual about this? I understand that
> this would allocate some resources on other nodes. Would
> Slurm enforce limits properly ("qos" or "partition" limits)?
> Kind Regards

  || \\UTGERS,     |----------------------*O*------------------------
  ||_// the State  |    Ryan Novosielski - novosirj at rutgers.edu
  || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
  ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark

More information about the slurm-users mailing list