[slurm-users] How to tell SLURM to ignore specific GPUs

Stephan Roth stephan.roth at ee.ethz.ch
Thu Feb 3 06:30:47 UTC 2022

On 02.02.22 18:32, Michael Di Domenico wrote:
> On Mon, Jan 31, 2022 at 3:57 PM Stephan Roth <stephan.roth at ee.ethz.ch> wrote:
>> The problem is to identify the cards physically from the information we
>> have, like what's reported with nvidia-smi or available in
>> /proc/driver/nvidia/gpus/*/information
>> The serial number isn't shown for every type of GPU and I'm not sure the
>> ones shown match the stickers on the GPUs.
>> If anybody were to know of a practical solution for this, I'd be happy
>> to read it.
> i hadn't seen this proc driver reference before.  checking a few of my
> A100's and V100's and some off hand Quadro cards, i don't see the
> serial number for any of them in the /proc.  sadly this would be
> pretty handy, does anyone know which cards do support this?  i wonder
> if there's some obscure something or other that needs to be turned on
> to dump out the serial number in /proc instead of running nvidia-smi

Sorry, I didn't state cleary what I was referring to.
I never saw the serial number in /proc/driver/nvidia/gpus/*/information, 
but by using nvidia-smi. The information was also sometimes empty:

nvidia-smi -q |grep -E '^\s+Serial Number\s+:'
     Serial Number                   : N/A

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4252 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220203/78b49288/attachment.bin>

More information about the slurm-users mailing list