[slurm-users] Specify a gpu ID

Ahmad Khalifa underoath006 at gmail.com
Fri Jun 4 18:42:45 UTC 2021


Thank you for your input Jason, I wasn't trying to "chide" you in any way.
I appreciate your contribution to the discussion.

On Fri, Jun 4, 2021 at 11:37 AM Jason Simms <simmsj at lafayette.edu> wrote:

> You don't need to chide me for making what is, to me, a reasonable
> solution. *You* may not be able to make hardware changes, but why the
> people who can would want failing GPUs remaining in a system is anathema to
> my approach to cluster management. In other words, I do not recommend you
> try to find a workaround to a solution that, in my opinion, is best solved
> by eliminating the faulty hardware. I understand the impulse, and if there
> is a simple solution to specifying a specific GPU, then fine, do that. But
> again it goes against treating such resources as generic - nodes and
> hardware should be thought of as cattle, not pets, and should be managed
> accordingly. Again, I believe you are trying to solve a problem that should
> not be yours to solve. Sorry if this irritates you.
>
> JLS
>
> On Fri, Jun 4, 2021 at 2:17 PM Ahmad Khalifa <underoath006 at gmail.com>
> wrote:
>
>> I can't make hardware changes, but I still want to make use of the
>> cluster. Let's keep the discussion on how to get slurm to do it, if that's
>> possible.
>>
>> On Fri, Jun 4, 2021 at 11:13 AM Jason Simms <simmsj at lafayette.edu> wrote:
>>
>>> Unpopular opinion: remove the failing GPU.
>>>
>>> JLS
>>>
>>> On Fri, Jun 4, 2021 at 2:07 PM Ahmad Khalifa <underoath006 at gmail.com>
>>> wrote:
>>>
>>>> Because there are failing GPUs that I'm trying to avoid.
>>>>
>>>> On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth <stephan.roth at ee.ethz.ch>
>>>> wrote:
>>>>
>>>>> On 03.06.21 07:11, Ahmad Khalifa wrote:
>>>>> > How to send a job to a particular gpu card using its ID
>>>>> (0,1,2...etc)?
>>>>>
>>>>> Why do you need to access a GPU based on its ID?
>>>>>
>>>>> If its to select a certain GPU type, there are other methods you can
>>>>> use.
>>>>>
>>>>> You could create partitions for the same GPU types or add features.
>>>>> Due to our heterogenous nodes with mixed GPU types we do the latter,
>>>>> we
>>>>> added a feature for the GPU architectures and one for the GPU types to
>>>>> each node.
>>>>>
>>>>> Cheers,
>>>>> Stephan
>>>>>
>>>>>
>>>
>>> --
>>> *Jason L. Simms, Ph.D., M.P.H.*
>>> Manager of Research and High-Performance Computing
>>> XSEDE Campus Champion
>>> Lafayette College
>>> Information Technology Services
>>> 710 Sullivan Rd | Easton, PA 18042
>>> Office: 112 Skillman Library
>>> p: (610) 330-5632
>>>
>>
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210604/7501341e/attachment.htm>


More information about the slurm-users mailing list