[slurm-users] How to tell SLURM to ignore specific GPUs

Michael Di Domenico mdidomenico4 at gmail.com
Wed Feb 2 17:32:30 UTC 2022


On Mon, Jan 31, 2022 at 3:57 PM Stephan Roth <stephan.roth at ee.ethz.ch> wrote:
> The problem is to identify the cards physically from the information we
> have, like what's reported with nvidia-smi or available in
> /proc/driver/nvidia/gpus/*/information
> The serial number isn't shown for every type of GPU and I'm not sure the
> ones shown match the stickers on the GPUs.
> If anybody were to know of a practical solution for this, I'd be happy
> to read it.

i hadn't seen this proc driver reference before.  checking a few of my
A100's and V100's and some off hand Quadro cards, i don't see the
serial number for any of them in the /proc.  sadly this would be
pretty handy, does anyone know which cards do support this?  i wonder
if there's some obscure something or other that needs to be turned on
to dump out the serial number in /proc instead of running nvidia-smi



More information about the slurm-users mailing list