<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
I haven't seen anything that allows for disabling a defined Gres device. It does seem to work if I define the GPUs that I don't want to use and then specifically submit jobs to the other GPUs using --gpu like "--gpu=gpu:rtx_2080_ti:1". I suppose if I set the
GPU Type to be "COMPUTE" for the GPUs I want to use for computing and "UNUSED" for those that I don't, this scheme might work (e.g., --gpu=gpu:COMPUTE:3). But then every job submission would be required to have this option set. Not a very workable solution.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Thanks!</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Steve</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Feng Zhang <prod.feng@gmail.com><br>
<b>Sent:</b> Friday, July 14, 2023 3:09 PM<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> Re: [slurm-users] Unconfigured GPUs being allocated</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">[Some people who received this message don't often get email from prod.feng@gmail.com. Learn why this is important at
<a href="https://aka.ms/LearnAboutSenderIdentification">https://aka.ms/LearnAboutSenderIdentification</a> ]<br>
<br>
---- External Email: Use caution with attachments, links, or sharing data ----<br>
<br>
<br>
Very interesting issue.<br>
<br>
I am guessing there might be a workaround: SInce oryx has 2 gpus<br>
instead, you can define both of them, but disable the GT 710? Does<br>
Slurm support this?<br>
<br>
Best,<br>
<br>
Feng<br>
<br>
Best,<br>
<br>
Feng<br>
<br>
<br>
On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M <stevew@purdue.edu> wrote:<br>
><br>
> Hi,<br>
><br>
> I manually configure the GPUs in our Slurm configuration (AutoDetect=off in gres.conf) and everything works fine when all the GPUs in a node are configured in gres.conf and available to Slurm. But we have some nodes where a GPU is reserved for running the
display and is specifically not configured in gres.conf. In these cases, Slurm includes this unconfigured GPU and makes it available to Slurm jobs. Using a simple Slurm job that executes "nvidia-smi -L", it will display the unconfigured GPU along with as
many configured GPUs as requested by the job.<br>
><br>
> For example, in a node configured with this line in slurm.conf:<br>
> NodeName=oryx CoreSpecCount=2 CPUs=8 RealMemory=64000 Gres=gpu:RTX2080TI:1<br>
> and this line in gres.conf:<br>
> Nodename=oryx Name=gpu Type=RTX2080TI File=/dev/nvidia1<br>
> I will get the following results from a job running "nvidia-smi -L" that requested a single GPU:<br>
> GPU 0: NVIDIA GeForce GT 710 (UUID: GPU-21fe15f0-d8b9-b39e-8ada-8c1c8fba8a1e)<br>
> GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-0dc4da58-5026-6173-1156-c4559a268bf5)<br>
><br>
> But in another node that has all GPUs configured in Slurm like this in slurm.conf:<br>
> NodeName=beluga CoreSpecCount=1 CPUs=16 RealMemory=128500 Gres=gpu:TITANX:2<br>
> and this line in gres.conf:<br>
> Nodename=beluga Name=gpu Type=TITANX File=/dev/nvidia[0-1]<br>
> I get the expected results from the job running "nvidia-smi -L" that requested a single GPU:<br>
> GPU 0: NVIDIA RTX A5500 (UUID: GPU-3754c069-799e-2027-9fbb-ff90e2e8e459)<br>
><br>
> I'm running Slurm 22.05.5.<br>
><br>
> Thanks in advance for any suggestions to help correct this problem!<br>
><br>
> Steve<br>
<br>
</div>
</span></font></div>
</body>
</html>