<div dir="auto">Hi Brian,<div dir="auto">Thanks,</div><div dir="auto">Yes we have a single node entry, its just that I accidentally put the commmented node entry as well in the message when pasting the config file. Sorry for that.</div><div dir="auto"><br></div><div dir="auto">So from what you mention, I should add some QOS settings to the partitions in order to set proper cpu affinities right?</div><br><br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Sat, May 8, 2021, 12:15 PM Brian Andrus <<a href="mailto:toomuchit@gmail.com">toomuchit@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div>

    <p>Cristóba, </p>

    <p>Your approach is a little off.</p>

    <p>Slurm needs to know about the node properties. It can then

      allocate them based on job/partition.</p>

    <p>So, you should have a single "NodeName" entry for the node the

      accurately describes what you want to allow access to at all.</p>

    <p>Then you limit what is allowed to be requested in the partition

      definition and/or a QOS (if you are using accounting).</p>

    <p>Brian Andrus<br>

    </p>

    <div>On 5/7/2021 8:11 PM, Cristóbal Navarro

      wrote:<br>

    </div>

    <blockquote type="cite">

      
      <div dir="ltr">

        <div>Hi community,</div>

        <div>I am unable to tell if SLURM is handling the following

          situation efficiently in terms of CPU affinities at each

          partition.</div>

        <div><br>

        </div>

        <div>Here we have a very small cluster with just one GPU node

          with 8x GPUs, that offers two partitions --> "gpu" and

          "cpu".</div>

        <div>

          <div>Part of the Config File</div>

          <div><span style="font-family:monospace">## Nodes list<br>

              ## use native GPUs<br>

              NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16

              ThreadsPerCore=1 RealMemory=1024000 State=UNKNOWN

              Gres=gpu:A100:8 Feature=gpu<br>

              <br>

              ## Default CPU layout (same total cores as others)<br>

              #NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16

              ThreadsPerCore=1 RealMemory=1024000 State=UNKNOWN

              Gres=gpu:a100:4,gpu:a100_20g:2,gpu:a100_10g:2,gpu:a100_5g:16

              Feature=ht,gpu<br>

              <br>

              ## Partitions list<br>

              PartitionName=gpu OverSubscribe=FORCE MaxCPUsPerNode=64

              DefCpuPerGPU=8 DefMemPerGPU=65556 MaxTime=1-00:00:00

              State=UP Nodes=nodeGPU01  Default=YES <br>

              PartitionName=cpu OverSubscribe=FORCE MaxCPUsPerNode=64

              DefMemPerNode=16384 MaxTime=1-00:00:00 State=UP

              Nodes=nodeGPU01</span></div>

          <div><span style="font-family:monospace"><br>

            </span></div>

          <div><span style="font-family:monospace"><br>

            </span></div>

        </div>

        <div>The node has 128 cpu cores (2x 64 core AMD cpus, SMT

          disabled) and resources have been subdivided from the

          partition options, 64 maxCores for each one.</div>

        <div>The gres file is auto-generated with nvml, at it obeys the

          following GPU topology (focus on CPU affinity) shown ahead<br>

        </div>

        <div><span style="font-family:monospace">➜  ~ nvidia-smi topo -m<br>

            GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_1 mlx5_2

            mlx5_3 mlx5_4 mlx5_5 mlx5_6 mlx5_7 mlx5_8 mlx5_9 CPU

            Affinity NUMA Affinity<br>

            GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 PXB PXB SYS SYS

            SYS SYS SYS SYS SYS SYS                              

            48-63            3<br>

            GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 PXB PXB SYS SYS

            SYS SYS SYS SYS SYS SYS                              

            48-63            3<br>

            GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 SYS SYS PXB PXB

            SYS SYS SYS SYS SYS SYS                              

            16-31            1<br>

            GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 SYS SYS PXB PXB

            SYS SYS SYS SYS SYS SYS                              

            16-31            1<br>

            GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS

            PXB PXB SYS SYS SYS SYS                              

            112-127          7<br>

            GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS

            PXB PXB SYS SYS SYS SYS                              

            112-127          7<br>

            GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS SYS SYS

            SYS SYS PXB PXB SYS SYS                              

            80-95            5<br>

            GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS SYS SYS

            SYS SYS PXB PXB SYS SYS                              

            80-95            5<br>

          </span></div>

        <div><br>

        </div>

        If we look closely, we can see specific CPU affinities for the

        GPUs, therefore I assume that the multi-core CPU jobs should use

        the 64 CPU cores that are not listed here, e.g, cores 0-15,

        32-47....<br>

        <div>

          <div>Will SLURM realize that CPU jobs should have this core

            affinity? if not, is there a way I can make the default CPU

            affinities the correct ones for all JOBs launched on the

            "cpu" partition?</div>

          <div>Any help is welcome<br>

          </div>

          <div>-- <br>

          </div>

          <div>

            <div dir="ltr" data-smartmail="gmail_signature">

              <div dir="ltr">

                <div>

                  <div dir="ltr">

                    <div>

                      <div dir="ltr">Cristóbal A. Navarro<br>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>


</blockquote></div></div>