[slurm-users] Using "srun" on compute nodes -- Ray cluster
Ryan Novosielski
novosirj at rutgers.edu
Fri Jul 15 16:22:04 UTC 2022
Are you talking about a script that is run via sbatch containing srun
command lines? If so, there are a lot of reasons to do that. One is
better instrumentation, as I understand it, but also srun --mpi is a way
to eliminate mpiexec/mpirun/etc., and is what we recommend at our site
instead (using the PMI2 or PMIx methods).
On 7/15/22 05:17, Kamil Wilczek wrote:
> Dear Slurm Users,
>
> one of my cluster users would like to run a Ray cluster on Slurm.
> I noticed that the batch script example requires running the "srun"
> command on a compute node, which already is allocated:
> https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template
>
>
> This is the first time I see or hear about this type of usage
> and I have problems wrapping my head around this.
> Is there anything wrong or unusual about this? I understand that
> this would allocate some resources on other nodes. Would
> Slurm enforce limits properly ("qos" or "partition" limits)?
>
> Kind Regards
--
#BlackLivesMatter
____
|| \\UTGERS, |----------------------*O*------------------------
||_// the State | Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
`'
More information about the slurm-users
mailing list