[slurm-users] Specify a gpu ID

Jason Simms simmsj at lafayette.edu
Fri Jun 4 18:35:53 UTC 2021


You don't need to chide me for making what is, to me, a reasonable
solution. *You* may not be able to make hardware changes, but why the
people who can would want failing GPUs remaining in a system is anathema to
my approach to cluster management. In other words, I do not recommend you
try to find a workaround to a solution that, in my opinion, is best solved
by eliminating the faulty hardware. I understand the impulse, and if there
is a simple solution to specifying a specific GPU, then fine, do that. But
again it goes against treating such resources as generic - nodes and
hardware should be thought of as cattle, not pets, and should be managed
accordingly. Again, I believe you are trying to solve a problem that should
not be yours to solve. Sorry if this irritates you.

JLS

On Fri, Jun 4, 2021 at 2:17 PM Ahmad Khalifa <underoath006 at gmail.com> wrote:

> I can't make hardware changes, but I still want to make use of the
> cluster. Let's keep the discussion on how to get slurm to do it, if that's
> possible.
>
> On Fri, Jun 4, 2021 at 11:13 AM Jason Simms <simmsj at lafayette.edu> wrote:
>
>> Unpopular opinion: remove the failing GPU.
>>
>> JLS
>>
>> On Fri, Jun 4, 2021 at 2:07 PM Ahmad Khalifa <underoath006 at gmail.com>
>> wrote:
>>
>>> Because there are failing GPUs that I'm trying to avoid.
>>>
>>> On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth <stephan.roth at ee.ethz.ch>
>>> wrote:
>>>
>>>> On 03.06.21 07:11, Ahmad Khalifa wrote:
>>>> > How to send a job to a particular gpu card using its ID (0,1,2...etc)?
>>>>
>>>> Why do you need to access a GPU based on its ID?
>>>>
>>>> If its to select a certain GPU type, there are other methods you can
>>>> use.
>>>>
>>>> You could create partitions for the same GPU types or add features.
>>>> Due to our heterogenous nodes with mixed GPU types we do the latter, we
>>>> added a feature for the GPU architectures and one for the GPU types to
>>>> each node.
>>>>
>>>> Cheers,
>>>> Stephan
>>>>
>>>>
>>
>> --
>> *Jason L. Simms, Ph.D., M.P.H.*
>> Manager of Research and High-Performance Computing
>> XSEDE Campus Champion
>> Lafayette College
>> Information Technology Services
>> 710 Sullivan Rd | Easton, PA 18042
>> Office: 112 Skillman Library
>> p: (610) 330-5632
>>
>

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210604/62853b25/attachment.htm>


More information about the slurm-users mailing list