[slurm-users] Job allocation from a heterogenous pool of nodes

Le, Viet Duc vdle at moasys.com
Sat Dec 17 13:18:58 UTC 2022


Hi Brian,

Thanks for suggesting this interesting feature of Slurm.
And sorry for the late follow up since I only had access to the cluster for
a short time.

We were now able to perform HPL benchmark across different partitions with
correct NUMA affinity.
For future reference, I put the procedure here:

$ salloc \
       --partition=v100 --nodes=1 --ntasks-per-node=40 --gres=gpu:4 : \
       --partition=a100 --nodes=1 --ntasks-per-node=64 --gres=gpu:8

$ srun \
       -n 4 : \
       -n 8   \
       hpl.sh

Initially we thought there would be some performance degradation when
mixing partitions.
But at least for small scale test, this seems to be negligible.

Thanks.
Viet-Duc

On Thu, Dec 8, 2022 at 2:27 AM Brian Andrus <toomuchit at gmail.com> wrote:

> You may want to look here:
>
> https://slurm.schedmd.com/heterogeneous_jobs.html
>
> Brian Andrus
> On 12/7/2022 12:42 AM, Le, Viet Duc wrote:
>
> Dear slurm community,
>
>
> I am encountering a unique situation where I need to allocate jobs to
> nodes with different numbers of CPU cores. For instance:
>
> node01:  Xeon 6226 32 cores
>
> node02:  EPYC 7543 64 cores
>
>
> $ salloc
> --partition=all --nodes=2 --nodelist=gpu01,gpu02 --ntasks-per-node=32 --comment=etc
>
> If --ntasks-per-node is larger than 32, the job could not be allocated
> since node01 has only 32 cores.
>
>
> In the context of NVIDIA's HPL container, we need to pin MPI
> processes according to NUMA affinity for best performance.
>
> For HGX-1, there are 8 A100s having affinity with 1st, 3rd, 5th, and 7th
> NUMA domain, respectively.
>
> With --ntasks-per-node=32, only the first half of EPYC's NUMA domain is
> available, and we had to assign the 4-7th A100 to 0th and 2nd NUMA domain,
> leading to some performance degradation.
>
>
> I am looking for a way to request more tasks than the number of physically
> available cores, i.e.
>
> $ salloc --partition=all --nodes=2 --nodelist=gpu01,gpu02
>  --ntasks-per-node=64 --comment=etc
>
>
> Your suggestions are much appreciated.
>
>
> Regards,
>
> Viet-Duc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221217/5394743b/attachment.htm>


More information about the slurm-users mailing list