Re: Dynamic MIG Question

List overview All Threads
Download

newer

older

Adding FPGAs for Slurm Tracking

Can we avoid jobs near wall time...

Davide DelVento

6 Feb 2026 6 Feb '26

2:02 p.m.

Aaron (or anyone else),

Did you manage to get Dynamic MIG working in Slurm? I'm actually surprised that after these many years SchedMD has not implemented this feature yet, especially now that newer GPUs allow MIG repartitioning without being root. The only mention of this in their ticketing system is at https://support.schedmd.com/show_bug.cgi?id=11091#c8 (and subsequent c10) which say that it's not on their roadmap, but that was 5 years ago.

I have heard that some users manage dynamic changes by draining nodes, running scripts to reconfigure MIG via nvidia-smi, bringing the node back and then submitting the job. Anybody here has tried that and with what success?

I speculate that now that NVIDIA owns SchedMD perhaps this feature will be at a higher priority, but maybe not? Anybody knows anything about it and is not bound by an NDA to keep mum?

Thanks

On Wed, Nov 22, 2023 at 1:22 PM Davide DelVento davide.quantum@gmail.com wrote:

...

I assume you mean the sentence about dynamic MIG at https://slurm.schedmd.com/gres.html#MIG_Management Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it.

On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann < aaron.kollmann@student.hpi.de> wrote:

...
Hello All,

I am currently working in a research project and we are trying to find out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in SLURM.

For instance:

a user requests a job and wants a GPU but none is available

now SLURM will reconfigure a MIG GPU to create a partition (e.g.

1g.5gb) which becomes available and allocated immediately

I can already reconfigure MIG + SLURM within a few seconds to start jobs on newly partitioned resources, but Jobs get killed when I restart slurmd on nodes with a changed MIG config. (see script example below)

*Do you think it is possible to develop a plugin or change SLURM to the extent that dynamic MIG will be supported one day? *

(The website says it is not supported)

Best

Aaron

#!/usr/bin/bash

# Generate Start Config killall slurmd killall slurmctld nvidia-smi mig -dci nvidia-smi mig -dgi nvidia-smi mig -cgi 19,14,5 -i 0 -C nvidia-smi mig -cgi 0 -i 1 -C cp -f ./slurm-19145-0.conf /etc/slurm/slurm.conf slurmd -c slurmctld -c sleep 5

# Start a running and a pending job (the first job gets killed by slurm) srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & sleep 5

# Simulate MIG Config Change nvidia-smi mig -i 1 -dci nvidia-smi mig -i 1 -dgi nvidia-smi mig -cgi 19,14,5 -i 1 -C cp -f ./slurm-2x19145.conf /etc/slurm/slurm.conf killall slurmd killall slurmctld slurmd slurmctld

Attachments:

attachment.html (text/html — 4.2 KB)

Show replies by date

Fulcomer, Sam

6 Feb 6 Feb

3:16 p.m.

New subject: Dynamic MIG Question

I actually spent a bit of time in the SLURM booth at SC discussing this (and also frequently hanging out in their comfy chairs - easy times on the bad hip).

This is on the back burner for us. The basic problem is that SLURM doesn't have a mechanism to drain a GPU; rather, the entire node has to be drained to make changes. That's the easy description of the problem. There may be ways to do it within the current capabilities of SLURM, but we haven't picked up that effort in earnest, yet...

We do find some occasional issues in nvml control of reconfiguring MIG on multi-GPU systems, where our scripts occasionally fail on one of more GPUs, and they need to be manually reconfigured after that (but that's something in the nvidia driver, presumably). We either run nodes un-MIGed or split into two nominally equal slices. We're currently defaulting to using MIG on half of our B200s and all of our Max-Qs.

Being able to drain a single GPU would obviously be great.

On Fri, Feb 6, 2026 at 5:07 PM Davide DelVento via slurm-users < slurm-users@lists.schedmd.com> wrote:

...

Aaron (or anyone else),

Did you manage to get Dynamic MIG working in Slurm? I'm actually surprised that after these many years SchedMD has not implemented this feature yet, especially now that newer GPUs allow MIG repartitioning without being root. The only mention of this in their ticketing system is at https://support.schedmd.com/show_bug.cgi?id=11091#c8 (and subsequent c10) which say that it's not on their roadmap, but that was 5 years ago.

I have heard that some users manage dynamic changes by draining nodes, running scripts to reconfigure MIG via nvidia-smi, bringing the node back and then submitting the job. Anybody here has tried that and with what success?

I speculate that now that NVIDIA owns SchedMD perhaps this feature will be at a higher priority, but maybe not? Anybody knows anything about it and is not bound by an NDA to keep mum?

Thanks

On Wed, Nov 22, 2023 at 1:22 PM Davide DelVento davide.quantum@gmail.com wrote:

...
I assume you mean the sentence about dynamic MIG at https://slurm.schedmd.com/gres.html#MIG_Management Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it.

On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann < aaron.kollmann@student.hpi.de> wrote:

...
Hello All,

I am currently working in a research project and we are trying to find out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in SLURM.

For instance:

a user requests a job and wants a GPU but none is available

now SLURM will reconfigure a MIG GPU to create a partition (e.g.

1g.5gb) which becomes available and allocated immediately

I can already reconfigure MIG + SLURM within a few seconds to start jobs on newly partitioned resources, but Jobs get killed when I restart slurmd on nodes with a changed MIG config. (see script example below)

*Do you think it is possible to develop a plugin or change SLURM to the extent that dynamic MIG will be supported one day? *

(The website says it is not supported)

Best

Aaron

#!/usr/bin/bash

# Generate Start Config killall slurmd killall slurmctld nvidia-smi mig -dci nvidia-smi mig -dgi nvidia-smi mig -cgi 19,14,5 -i 0 -C nvidia-smi mig -cgi 0 -i 1 -C cp -f ./slurm-19145-0.conf /etc/slurm/slurm.conf slurmd -c slurmctld -c sleep 5

# Start a running and a pending job (the first job gets killed by slurm) srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & sleep 5

# Simulate MIG Config Change nvidia-smi mig -i 1 -dci nvidia-smi mig -i 1 -dgi nvidia-smi mig -cgi 19,14,5 -i 1 -C cp -f ./slurm-2x19145.conf /etc/slurm/slurm.conf killall slurmd killall slurmctld slurmd slurmctld

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Davide DelVento

3:54 p.m.

New subject: Dynamic MIG Question

Thanks for sharing, this is very good to know. Having to drain the whole node is obviously not ideal, but still much better than not being able to do dynamic MIG at all, so perhaps I'll give it a try after I talk to the users to better understand their priorities. I'll keep you posted about what I find out, please do likewise if you end up playing with it. Thanks and have a great weekend

On Fri, Feb 6, 2026 at 4:16 PM Fulcomer, Sam samuel_fulcomer@brown.edu wrote:

...

I actually spent a bit of time in the SLURM booth at SC discussing this (and also frequently hanging out in their comfy chairs - easy times on the bad hip).

This is on the back burner for us. The basic problem is that SLURM doesn't have a mechanism to drain a GPU; rather, the entire node has to be drained to make changes. That's the easy description of the problem. There may be ways to do it within the current capabilities of SLURM, but we haven't picked up that effort in earnest, yet...

We do find some occasional issues in nvml control of reconfiguring MIG on multi-GPU systems, where our scripts occasionally fail on one of more GPUs, and they need to be manually reconfigured after that (but that's something in the nvidia driver, presumably). We either run nodes un-MIGed or split into two nominally equal slices. We're currently defaulting to using MIG on half of our B200s and all of our Max-Qs.

Being able to drain a single GPU would obviously be great.

On Fri, Feb 6, 2026 at 5:07 PM Davide DelVento via slurm-users < slurm-users@lists.schedmd.com> wrote:

...
Aaron (or anyone else),

Did you manage to get Dynamic MIG working in Slurm? I'm actually surprised that after these many years SchedMD has not implemented this feature yet, especially now that newer GPUs allow MIG repartitioning without being root. The only mention of this in their ticketing system is at https://support.schedmd.com/show_bug.cgi?id=11091#c8 (and subsequent c10) which say that it's not on their roadmap, but that was 5 years ago.

I have heard that some users manage dynamic changes by draining nodes, running scripts to reconfigure MIG via nvidia-smi, bringing the node back and then submitting the job. Anybody here has tried that and with what success?

I speculate that now that NVIDIA owns SchedMD perhaps this feature will be at a higher priority, but maybe not? Anybody knows anything about it and is not bound by an NDA to keep mum?

Thanks

On Wed, Nov 22, 2023 at 1:22 PM Davide DelVento davide.quantum@gmail.com wrote:

...
I assume you mean the sentence about dynamic MIG at https://slurm.schedmd.com/gres.html#MIG_Management Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it.

On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann < aaron.kollmann@student.hpi.de> wrote:

...
Hello All,

I am currently working in a research project and we are trying to find out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in SLURM.

For instance:

a user requests a job and wants a GPU but none is available

now SLURM will reconfigure a MIG GPU to create a partition (e.g.

1g.5gb) which becomes available and allocated immediately

I can already reconfigure MIG + SLURM within a few seconds to start jobs on newly partitioned resources, but Jobs get killed when I restart slurmd on nodes with a changed MIG config. (see script example below)

*Do you think it is possible to develop a plugin or change SLURM to the extent that dynamic MIG will be supported one day? *

(The website says it is not supported)

Best

Aaron

#!/usr/bin/bash

# Generate Start Config killall slurmd killall slurmctld nvidia-smi mig -dci nvidia-smi mig -dgi nvidia-smi mig -cgi 19,14,5 -i 0 -C nvidia-smi mig -cgi 0 -i 1 -C cp -f ./slurm-19145-0.conf /etc/slurm/slurm.conf slurmd -c slurmctld -c sleep 5

# Start a running and a pending job (the first job gets killed by slurm) srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & sleep 5

# Simulate MIG Config Change nvidia-smi mig -i 1 -dci nvidia-smi mig -i 1 -dgi nvidia-smi mig -cgi 19,14,5 -i 1 -C cp -f ./slurm-2x19145.conf /etc/slurm/slurm.conf killall slurmd killall slurmctld slurmd slurmctld

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Fulcomer, Sam

4:05 p.m.

New subject: Dynamic MIG Question

I'm currently embarking on another "teaching SLURM new tricks" project, and won't be working on the semi-auto re-MIG for a while.

We've long had a "Level 3" (DUA-governed) multi-tenant environment for people who have datasets that we can't administratively allow on our main cluster. Until now its had very limited resources available to the tenants (N x VMs, where more VMs provide more compute).

We're now starting a project to develop a system in which we have, per-tenant, one VM to land on as a login node, and another to run slurmctld/mariadbd/slurmdbd. Outside the tenant environment we'll have an "uber-SLURM" cluster that fields requests from a broker process sitting between it and the tenant SLURMs, and allocates resources for them, making them available within the tenant's VLAN/subnet space. This is almost like the uber-SLURM is a cloud resource manager, but rather than running jobs in its cloud it's only plucking the resources from it to push down into the tenant for a period of time. The tenant will have cloud node resources defined in its SLURM configuration.

At least, that's the idea in broad strokes.

On Fri, Feb 6, 2026 at 6:55 PM Davide DelVento davide.quantum@gmail.com wrote:

...

Thanks for sharing, this is very good to know. Having to drain the whole node is obviously not ideal, but still much better than not being able to do dynamic MIG at all, so perhaps I'll give it a try after I talk to the users to better understand their priorities. I'll keep you posted about what I find out, please do likewise if you end up playing with it. Thanks and have a great weekend

On Fri, Feb 6, 2026 at 4:16 PM Fulcomer, Sam samuel_fulcomer@brown.edu wrote:

...
I actually spent a bit of time in the SLURM booth at SC discussing this (and also frequently hanging out in their comfy chairs - easy times on the bad hip).

This is on the back burner for us. The basic problem is that SLURM doesn't have a mechanism to drain a GPU; rather, the entire node has to be drained to make changes. That's the easy description of the problem. There may be ways to do it within the current capabilities of SLURM, but we haven't picked up that effort in earnest, yet...

We do find some occasional issues in nvml control of reconfiguring MIG on multi-GPU systems, where our scripts occasionally fail on one of more GPUs, and they need to be manually reconfigured after that (but that's something in the nvidia driver, presumably). We either run nodes un-MIGed or split into two nominally equal slices. We're currently defaulting to using MIG on half of our B200s and all of our Max-Qs.

Being able to drain a single GPU would obviously be great.

On Fri, Feb 6, 2026 at 5:07 PM Davide DelVento via slurm-users < slurm-users@lists.schedmd.com> wrote:

...
Aaron (or anyone else),

Did you manage to get Dynamic MIG working in Slurm? I'm actually surprised that after these many years SchedMD has not implemented this feature yet, especially now that newer GPUs allow MIG repartitioning without being root. The only mention of this in their ticketing system is at https://support.schedmd.com/show_bug.cgi?id=11091#c8 (and subsequent c10) which say that it's not on their roadmap, but that was 5 years ago.

I have heard that some users manage dynamic changes by draining nodes, running scripts to reconfigure MIG via nvidia-smi, bringing the node back and then submitting the job. Anybody here has tried that and with what success?

I speculate that now that NVIDIA owns SchedMD perhaps this feature will be at a higher priority, but maybe not? Anybody knows anything about it and is not bound by an NDA to keep mum?

Thanks

On Wed, Nov 22, 2023 at 1:22 PM Davide DelVento < davide.quantum@gmail.com> wrote:

...
I assume you mean the sentence about dynamic MIG at https://slurm.schedmd.com/gres.html#MIG_Management Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it.

On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann < aaron.kollmann@student.hpi.de> wrote:

...
Hello All,

I am currently working in a research project and we are trying to find out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in SLURM.

For instance:

a user requests a job and wants a GPU but none is available

now SLURM will reconfigure a MIG GPU to create a partition (e.g.

1g.5gb) which becomes available and allocated immediately

I can already reconfigure MIG + SLURM within a few seconds to start jobs on newly partitioned resources, but Jobs get killed when I restart slurmd on nodes with a changed MIG config. (see script example below)

*Do you think it is possible to develop a plugin or change SLURM to the extent that dynamic MIG will be supported one day? *

(The website says it is not supported)

Best

Aaron

#!/usr/bin/bash

# Generate Start Config killall slurmd killall slurmctld nvidia-smi mig -dci nvidia-smi mig -dgi nvidia-smi mig -cgi 19,14,5 -i 0 -C nvidia-smi mig -cgi 0 -i 1 -C cp -f ./slurm-19145-0.conf /etc/slurm/slurm.conf slurmd -c slurmctld -c sleep 5

# Start a running and a pending job (the first job gets killed by slurm) srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & srun -w gx06 -c 2 --mem 1G --gres=gpu:a100_1g.5gb:1 sleep 300 & sleep 5

# Simulate MIG Config Change nvidia-smi mig -i 1 -dci nvidia-smi mig -i 1 -dgi nvidia-smi mig -cgi 19,14,5 -i 1 -C cp -f ./slurm-2x19145.conf /etc/slurm/slurm.conf killall slurmd killall slurmctld slurmd slurmctld

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Age (days ago)

Last active (days ago)

slurm-users@lists.schedmd.com

3 comments

2 participants

tags (0)

participants (2)

Davide DelVento
Fulcomer, Sam