<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Diego,</p>
    <p>Not to start a debate, I guess it is in how you look at it.</p>
    <p>From Intel's descriptions:</p>
    <p style="box-sizing: border-box; margin: 0px 0px 16px; line-height:
      1.625rem; position: relative; color: rgb(38, 38, 38); font-family:
      intel-clear, tahoma, Helvetica, helvetica, Arial, sans-serif;
      font-size: 20px; font-style: normal; font-variant-ligatures:
      normal; font-variant-caps: normal; font-weight: 400;
      letter-spacing: normal; orphans: 2; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      background-color: rgb(255, 255, 255); text-decoration-thickness:
      initial; text-decoration-style: initial; text-decoration-color:
      initial;"><font size="2">How does Hyper-Threading work? When
        Intel® Hyper-Threading Technology is active, the CPU exposes two
        execution contexts per physical core. This means that one
        physical core now works like two “logical cores” that can handle
        different software threads. The ten-core <a
href="https://www.intel.com/content/www/us/en/gaming/i9-desktop-processors-for-gaming.html"
          style="text-decoration: none; box-sizing: border-box;
          background: transparent; color: rgb(0, 104, 181);">Intel®
          Core™ i9-10900K</a><span> </span>processor, for example, has
        20 threads when Hyper-Threading is enabled.</font></p>
    <p style="box-sizing: border-box; margin: 0px 0px 16px; line-height:
      1.625rem; position: relative; color: rgb(38, 38, 38); font-family:
      intel-clear, tahoma, Helvetica, helvetica, Arial, sans-serif;
      font-size: 20px; font-style: normal; font-variant-ligatures:
      normal; font-variant-caps: normal; font-weight: 400;
      letter-spacing: normal; orphans: 2; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      background-color: rgb(255, 255, 255); text-decoration-thickness:
      initial; text-decoration-style: initial; text-decoration-color:
      initial;"><font size="2">Two logical cores can work through tasks
        more efficiently than a traditional single-threaded core. <i>By
          taking advantage of idle time when the core would formerly be
          waiting for other tasks to complete</i>, Intel®
        Hyper-Threading Technology improves CPU throughput (by up to 30%
        in server applications<sup style="box-sizing: border-box;
          font-size: 10px; line-height: 0; position: relative;
          vertical-align: top; top: 0.5em;">3</sup>).</font></p>
    <p></p>
    <div class="moz-cite-prefix">So if we are creating code that is
      hypothetically 100% efficient (it can use the CPU 100% of the
      time), there would be no 'idle' time for another process. If work
      is done on that other process, it would be at the expense of the
      100% efficiency enjoyed by our 'perfect' process.</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">Of course, the true performance answer
      lies in how any of the processes work, which is why some of us do
      so many experimental runs of jobs and gather timings. We have yet
      to see a 100% efficient process, but folks are improving things
      all the time.</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">Brian Andrus<br>
    </div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">On 2/13/2023 9:56 PM, Diego Zuccato
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:c5c7a4ed-8630-fdfb-fac0-f46e3db3ca28@unibo.it">I think
      that's incorrect:
      <br>
      > The concept of hyper-threading is not doubling cores. It is a
      single
      <br>
      > core that can 'instantly' switch work from one process to
      another.
      <br>
      > Only one is being worked on at any given time.
      <br>
      <br>
      A core can have multiple (usually 2) independent execution
      pipelines, so that multiple instructions from different threads
      run concurrently. It does not switch from one to the other.
      <br>
      But it does have some shared resources, like the MMU and sometimes
      the FPU (maybe only on older AMD processors). Having a single MMU
      means that all the instructions running on a core must have the
      same "view" of the memory space, and that means that they must
      come from a single process. IOW that they're multiple threads of a
      single process.
      <br>
      <br>
      If the sw you're going to run makes good use of multithreading,
      having hyperthreading can pe a great boost. If the sw only uses
      multitasking, then hyperthreading is a net loss (not only you
      can't use half the available threads, you also usually get slower
      clock speeds).
      <br>
      <br>
      Diego
      <br>
      <br>
      Il 13/02/2023 15:29, Brian Andrus ha scritto:
      <br>
      <blockquote type="cite">Hermann makes a good point.
        <br>
        <br>
        The concept of hyper-threading is not doubling cores. It is a
        single core that can 'instantly' switch work from one process to
        another. Only one is being worked on at any given time.
        <br>
        <br>
        So if I request a single core on a hyper-threaded system, I
        would not be pleased to find you are giving it to someone else
        1/2 the time. I would need to have the actual core assigned. If
        I request multiple cores and my app is only going to affect
        itself, then I _may_ benefit from hyper-threading.
        <br>
        <br>
        In general, enabling hyper-threading is not the best practice
        for efficient HPC jobs. The goal is that every process is
        utilizing the CPU as close to 100% as possible, which would
        render hyper-threading moot.
        <br>
        <br>
        Brian Andrus
        <br>
        <br>
        On 2/13/2023 12:15 AM, Hermann Schwärzler wrote:
        <br>
        <blockquote type="cite">Hi Sebastian,
          <br>
          <br>
          I am glad I could help (although not exactly as expected :-).
          <br>
          <br>
          With your node-configuration you are "circumventing" how Slurm
          behaves, when using "CR_Core": if you read the respective part
          in
          <br>
          <br>
          <a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html">https://slurm.schedmd.com/slurm.conf.html</a>
          <br>
          <br>
          it says:
          <br>
          <br>
          "CR_Core
          <br>
            [...] On nodes with hyper-threads, each thread is counted as
          a CPU to satisfy a job's resource requirement, but multiple
          jobs are not allocated threads on the same core."
          <br>
          <br>
          That's why you got a full core (both threads) when allocating
          a singe CPU. Or e.g. four threads when allocating three CPUs
          asf.
          <br>
          <br>
          "Lying" to Slurm about the actual hardware-setup helps to
          avoid this behaviour but are you really confident with
          potentially running two different jobs on the hyper-threads of
          the same core?
          <br>
          <br>
          Regards,
          <br>
          Hermann
          <br>
          <br>
          On 2/12/23 22:04, Sebastian Schmutzhard-Höfler wrote:
          <br>
          <blockquote type="cite">Hi Hermann,
            <br>
            <br>
            Using your suggested settings did not work for us.
            <br>
            <br>
            When trying to allocate a single thread with
            --cpus-per-task=1, it still reserved a whole CPU (two
            threads). On the other hand, when requesting an even number
            of threads, it does what it should.
            <br>
            <br>
            However, I could make it work by using
            <br>
            <br>
            SelectTypeParameters=CR_Core
            <br>
            NodeName=nodename Sockets=2 CoresPerSocket=128
            ThreadsPerCore=1
            <br>
            <br>
            instead of
            <br>
            <br>
            SelectTypeParameters=CR_Core
            <br>
            NodeName=nodename Sockets=2 CoresPerSocket=64
            ThreadsPerCore=2
            <br>
            <br>
            So your suggestion brought me in the right direction.
            Thanks!
            <br>
            <br>
            If anyone thinks this is complete nonsense, please let me
            know!
            <br>
            <br>
            Best wishes,
            <br>
            <br>
            Sebastian
            <br>
            <br>
            On 11.02.23 11:13, Hermann Schwärzler wrote:
            <br>
            <blockquote type="cite">Hi Sebastian,
              <br>
              <br>
              we did a similar thing just recently.
              <br>
              <br>
              We changed our node settings from
              <br>
              <br>
              NodeName=DEFAULT CPUs=64 Boards=1 SocketsPerBoard=2
              CoresPerSocket=32 ThreadsPerCore=2
              <br>
              <br>
              to
              <br>
              <br>
              NodeName=DEFAULT Boards=1 SocketsPerBoard=2
              CoresPerSocket=32 ThreadsPerCore=2
              <br>
              <br>
              in order to make use of individual hyper-threads possible
              (we use this in combination with
              <br>
              SelectTypeParameters=CR_Core_Memory).
              <br>
              <br>
              This works as expected: after this, when e.g. asking for
              --cpus-per-task=4 you will get 4 hyper-threads (2 cores)
              per task (unless you also specify e.g.
              "--hint=nomultithread").
              <br>
              <br>
              So you might try to remove the "CPUs=256" part of your
              node-specification to let Slurm do that calculation of the
              number of CPUs itself.
              <br>
              <br>
              <br>
              BTW: on a side-note: as most of our users do not bother to
              use hyper-threads or even do not want to as their programs
              might suffer from doing so, we made "--hint=nomultithread"
              the default in our installation by adding
              <br>
              <br>
              CliFilterPlugins=cli_filter/lua
              <br>
              <br>
              to our slurm.conf and creating a cli_filter.lua file in
              the same directory as slurm.conf, that contains this
              <br>
              <br>
              function slurm_cli_setup_defaults(options, early_pass)
              <br>
                      options['hint'] = 'nomultithread'
              <br>
              <br>
                      return slurm.SUCCESS
              <br>
              end
              <br>
              <br>
              (see also
<a class="moz-txt-link-freetext" href="https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example">https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example</a>).<br>
              So if user really want to use hyper-threads they have to
              add "--hint=multithread" to their job/allocation-options.
              <br>
              <br>
              Regards,
              <br>
              Hermann
              <br>
              <br>
              On 2/10/23 00:31, Sebastian Schmutzhard-Höfler wrote:
              <br>
              <blockquote type="cite">Dear all,
                <br>
                <br>
                we have a node with 2 x 64 CPUs (with two threads each)
                and 8 GPUs, running slurm 22.05.5
                <br>
                <br>
                In order to make use of individual threads, we changed|
                <br>
                |
                <br>
                <br>
                |SelectTypeParameters=CR_Core||
                <br>
                NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64
                ThreadsPerCore=2 |
                <br>
                <br>
                to
                <br>
                <br>
                |SelectTypeParameters=CR_CPU NodeName=nodename CPUs=256|
                <br>
                <br>
                We are now able to allocate individual threads to jobs,
                despite the following error in slurmd.log:
                <br>
                <br>
                error: Node configuration differs from hardware:
                CPUs=256:256(hw) Boards=1:1(hw)
                SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw)
                ThreadsPerCore=1:2(hw)
                <br>
                <br>
                <br>
                However, it appears that since this change, we can only
                make use of 4 out of the 8 GPUs.
                <br>
                The output of "sinfo -o %G" might be relevant.
                <br>
                <br>
                In the first situation it was
                <br>
                <br>
                $ sinfo -o %G
                <br>
                GRES
                <br>
                gpu:A100:8(S:0,1)
                <br>
                <br>
                Now it is:
                <br>
                <br>
                $ sinfo -o %G
                <br>
                GRES
                <br>
gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
                <br>
                <br>
                ||Has anyone faced this or a similar issue and can give
                me some directions?
                <br>
                Best wishes
                <br>
                <br>
                Sebastian
                <br>
                <br>
                ||
                <br>
                <br>
              </blockquote>
              <br>
            </blockquote>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>