27 May
2026
27 May
'26
6:46 a.m.
Hello, We are noticing that some of the gpus on a specific node have "fallen of the bus". We would like to remove this specific gpu from the slurm scheduler. For example, let's say GPU0 has fallen off the bus, we would need the rest of the GPU1-8 to be available and make GPU0 not able to be allocated. How can we achieve that? I have read about blacklist on the slurm forum but it seems there is no satisfying solution. Best, *Fritz Ratnasamy*Data Scientist Information Technology