<div dir="ltr">I assume you mean the sentence about dynamic MIG at <a href="https://slurm.schedmd.com/gres.html#MIG_Management" target="_blank">https://slurm.schedmd.com/gres.html#MIG_Management</a><div>Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann <<a href="mailto:aaron.kollmann@student.hpi.de" target="_blank">aaron.kollmann@student.hpi.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<p>Hello All,</p>
<p>I am currently working in a research project and we are trying to
find out whether we can use NVIDIAs multi-instance GPU (MIG)
dynamically in SLURM.</p>
<p>For instance:</p>
<p>- a user requests a job and wants a GPU but none is available </p>
<p>- now SLURM will reconfigure a MIG GPU to create a partition
(e.g. 1g.5gb) which becomes available and allocated immediately</p>
<p>I can already reconfigure MIG + SLURM within a few seconds to
start jobs on newly partitioned resources, but Jobs get killed
when I restart slurmd on nodes with a changed MIG config. (see
script example below)</p>
<p><b>Do you think it is possible to develop a plugin or change
SLURM to the extent that dynamic MIG will be supported one day?
</b></p>
<p>(The website says it is not supported)<b><br>
</b></p>
<p><br>
</p>
<p><b><br>
</b></p>
<p>Best</p>
<p>- Aaron<b><br>
</b></p>
<p><br>
</p>
<p><br>
</p>
<p><font size="1"><br>
#!/usr/bin/bash<br>
<br>
# Generate Start Config<br>
killall slurmd<br>
killall slurmctld<br>
nvidia-smi mig -dci<br>
nvidia-smi mig -dgi<br>
nvidia-smi mig -cgi 19,14,5 -i 0 -C<br>
nvidia-smi mig -cgi 0 -i 1 -C<br>
cp -f ./slurm-19145-0.conf /etc/slurm/slurm.conf<br>
slurmd -c<br>
slurmctld -c<br>
sleep 5<br>
<br>
# Start a running and a pending job (the first job gets killed
by slurm)<br>
srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300
& <br>
srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300
&<br>
sleep 5<br>
<br>
# Simulate MIG Config Change<br>
nvidia-smi mig -i 1 -dci<br>
nvidia-smi mig -i 1 -dgi<br>
nvidia-smi mig -cgi 19,14,5 -i 1 -C<br>
cp -f ./slurm-2x19145.conf /etc/slurm/slurm.conf<br>
killall slurmd<br>
killall slurmctld<br>
slurmd<br>
slurmctld</font><br>
</p>
</div>
</blockquote></div>