[slurm-users] CUDA vs OpenCL

Valerio Bellizzomi valerio at selnet.org
Thu May 6 10:00:21 UTC 2021


On Thu, 2021-05-06 at 08:58 +0000, Williams, Gareth (IM&T, Black
Mountain) wrote:
> ROCR_VISIBLE_DEVICES Is the closer analogy. GPU_DEVICE_ORDINAL is in
> principle more generic (though does have GPU in the name). OpenCL
> could in principle (can!) run on other devices which could/can have
> more exotic topology, but for the sake of simplicity are likely to be
> presented as a list of devices...
> 
> Gareth   

Here is a ROCm issue discussion on device selection:
https://github.com/RadeonOpenCompute/ROCm/issues/994

ROCm also has a different way to select devices by serial number using
the rocm-smi interface, this approach is much more reliable than using
device ordinals:
https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-SMI-CLI.html?highlight=showuniqueid



> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf
> Of Valerio Bellizzomi
> Sent: Thursday, 6 May 2021 6:35 PM
> To: slurm-users at lists.schedmd.com
> Subject: Re: [slurm-users] CUDA vs OpenCL
> 
> On Thu, 2021-05-06 at 08:00 +0000, Williams, Gareth (IM&T, Black
> Mountain) wrote:
> > The post has me thinking so I did a little searching... AMD have
> > an 
> > offering that supports OpenCL and they are not NVIDIA. They use a 
> > different approach:
> > https://rocmdocs.amd.com/en/latest/Programming_Guides/Opencl-programmi
> > ng-guide.html#masking-visible-devices
> 
> Thank you for the pointer. It seems to me that they just name the
> variable differently (GPU_DEVICE_ORDINAL) but the approach is the
> same.
> 
> 
> > FWIW I did not yet see anything there about cgroups and enforced 
> > device visibility/constraints vs playing nicely with environment 
> > variables.
> 
> Here documentation on device cgroups:
> https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-System-Managment.html?highlight=device%20cgroups#device-cgroup
> 
> 
> > For reference, I have no AMD affiliation and little to no direct 
> > experience.
> > 
> > It is pretty easy to also find what else supports OpenCL
> > (Wikipedia?). 
> > What environment to honor seems to me to mostly be a software
> > choice 
> > and most of the software is from vendors, albeit sometimes being
> > open 
> > source or using on or relying on open source components or layers.
> > 
> > Gareth
> > 
> > -----Original Message-----
> > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf
> > Of 
> > Valerio Bellizzomi
> > Sent: Thursday, 6 May 2021 5:21 PM
> > To: slurm-users at lists.schedmd.com
> > Subject: Re: [slurm-users] CUDA vs OpenCL
> > 
> > On Wed, 2021-04-28 at 10:56 +0200, Valerio Bellizzomi wrote:
> > > Greetings,
> > > I see here https://slurm.schedmd.com/gres.html#GPU_Management
> > > that 
> > > CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about
> > > OpenCL 
> > > GPUs?
> > > 
> > > Is there an OPENCL_VISIBLE_DEVICES ?
> > > 
> > > 
> > 
> > Lack of followup lets me conclude that there isn't an OpenCL 
> > equivalent of CUDA_VISIBLE_DEVICES. It is unfortunate that this
> > open 
> > source software is committed to a single gpu supplier.
> > 
> 
> 




More information about the slurm-users mailing list