[slurm-users] How to tell SLURM to ignore specific GPUs

Tue Feb 1 14:41:47 UTC 2022

First, thanks Tim for the nvidia-smi 'drain' pointer.  That works
but I will still confused why what I did did not work

But Esben's reference explains it though I think the default
behavior very wierd in this case.  I would think SLURM itself
should default things to CUDA_DEVICE_ORDER=PCI_BUS_ID

In order for this to work I guess we have to make sure that
CUDA_DEVICE_ORDER=PCI_BUS_ID is set on every process (slurmd, epilog,
prolog, and job itself) to be consistent and how to do that
easily is not completely evident.

Would just having a /etc/profile.d/cudaorder.sh guarantee it or
are their instances where it would be ignored there?

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Tue, 1 Feb 2022 3:09am, EPF (Esben Peter Friis) wrote:

> The numbering seen from nvidia-smi is not necessarily the same as the order of /dev/nvidiaXX.
> There is a way to force that, though, using CUDA_DEVICE_ORDER.
>
> See https://shawnliu.me/post/nvidia-gpu-id-enumeration-in-linux/
>
>
> Cheers,
>
> Esben
> ________________________________
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Timony, Mick <Michael_Timony at hms.harvard.edu>
> Sent: Monday, January 31, 2022 15:45
> To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] How to tell SLURM to ignore specific GPUs
>
> I have a large compute node with 10 RTX8000 cards at a remote colo.
> One of the cards on it is acting up "falling of the bus" once a day
> requiring a full power cycle to reset.
>
> I want jobs to avoid that card as well as the card it is NVLINK'ed to.
>
> So I modified gres.conf on that node as follows:
>
>
> # cat /etc/slurm/gres.conf
> AutoDetect=nvml
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia0
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia1
> #Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia2
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia3
> #Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia4
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia5
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia6
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia7
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia8
> Name=gpu Type=quadro_rtx_8000 File=/dev/nvidia9
>
> and it slurm.conf I changed for node def Gres=gpu:quadro_rtx_8000:10
> to be Gres=gpu:quadro_rtx_8000:8.  I restarted slurmctld and slurmd
> after this.
>
> I then put the node back from drain to idle.  Jobs were sumbitted and
> started on the node but they are using the GPU I told it to avoid
>
> +--------------------------------------------------------------------+
> | Processes:                                                         |
> |  GPU   GI   CI        PID   Type   Process name         GPU Memory |
> |        ID   ID                                          Usage      |
> |====================================================================|
> |    0   N/A  N/A     63426      C   python                 11293MiB |
> |    1   N/A  N/A     63425      C   python                 11293MiB |
> |    2   N/A  N/A     63425      C   python                 10869MiB |
> |    2   N/A  N/A     63426      C   python                 10869MiB |
> |    4   N/A  N/A     63425      C   python                 10849MiB |
> |    4   N/A  N/A     63426      C   python                 10849MiB |
> +--------------------------------------------------------------------+
>
> How can I make SLURM not use GPU 2 and 4?
>
> ---------------------------------------------------------------
> Paul Raines                     http://help.nmr.mgh.harvard.edu<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhelp.nmr.mgh.harvard.edu%2F&data=04%7C01%7Cepf%40novozymes.com%7C295ce6d305984d38921e08d9e4c88781%7C43d5f49ee03a4d22a2285684196bb001%7C0%7C0%7C637792372191800703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oxpga348rfrZpOg0XSDepHfdUHirfgq46c6ZXcYoHvI%3D&reserved=0>
> MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
> 149 (2301) 13th Street     Charlestown, MA 02129            USA
>
>
> You can use the nvidia-smi command to 'drain' the GPU's which will power-down the GPU's and no applications will use them.
>
> This thread on stack overflow explains how to do that:
>
> https://unix.stackexchange.com/a/654089/94412<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Funix.stackexchange.com%2Fa%2F654089%2F94412&data=04%7C01%7Cepf%40novozymes.com%7C295ce6d305984d38921e08d9e4c88781%7C43d5f49ee03a4d22a2285684196bb001%7C0%7C0%7C637792372191956924%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Z8EBb1jxUD0sECO2R1m0CYIn4xy6HA%2Fx5AsqIBykoCY%3D&reserved=0>
>
> You can create a script to run at boot and 'drain' the cards.
>
> Regards
> --Mick
>
>
>