[slurm-users] Assign Tasks to Specific GPUs in a Multi-node Job

Le, Viet Duc vdle at moasys.com
Fri Sep 15 08:26:56 UTC 2023


Dear Slurm Community,

We are looking for a way to assign tasks to specific GPUs in a multi-node
job.

Let's consider a partition consisting of identical DGX-A100 nodes.
With exclusive mode, when selecting one A100 per node via `--gres` option,
the first GPU, i.e. CUDA_VISIBLE_DEVICES=0, will always be selected.

[Slurm script]
#SBATCH --job-name=test
#SBATCH --partition=dgx
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1

Is a finer grain of control of GPU selection supported ?  For instance:

[Slurm script]
#SBATCH --nodelist=gpu01:id=0,gpu02:id=2

The above would assign tasks to the 0th GPU and 2nd GPU on gpu01 and gpu02,
respectively.
After going through both the manual and support list, this feature seems to
be unsupported due to its niche use case.
But in case we overlook something trivial, or there is an ingenious way to
achieve the same effect, we appreciate your suggestions.

Regards.
Viet-Duc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230915/48480a01/attachment.htm>


More information about the slurm-users mailing list