<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>I suspect you are misunderstanding how the flow works.</p>
    <ol>
      <li>You request X nodes to do some work.</li>
      <li>You start a job that uses all the nodes.</li>
      <li>Job runs until everything is done.</li>
      <li>Resources are released back to be used again.</li>
    </ol>
    <p>If your job allows it, you probably want an array job, which will
      be such that each part can be run independently of the others.
      This allows the resources for that part to be released when that
      part is complete.</p>
    <p>Bottom line:<br>
      Resources are not released when they are not being used, they are
      released when the job is done.</p>
    <p>Brian Andrus<br>
    </p>
    <div class="moz-cite-prefix">On 6/26/2021 11:59 PM, Grigory Ptashko
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:3EC15299-3AB3-4993-8F1B-3D2718D96F5A@gmail.com">
      <pre class="moz-quote-pre" wrap="">Hello!

Recently I've started using MPI on our HPC-cluster.
It has 40 nodes.
It runs SLURM.
I'm new to MPI and SLURM but so far everything works fine except one thing.
In short: nodes that finished calculation do not become idle.
Only after all the nodes finished calculations they all become idle.

Here's an example of a typical node:

$ scontrol show nodes cn-022
NodeName=cn-022 Arch=x86_64 CoresPerSocket=18
CPUAlloc=36 CPUTot=36 CPULoad=1.01
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=cn-022 NodeHostName=cn-022 Version=18.08
OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018
RealMemory=1 AllocMem=0 FreeMem=507942 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=normal,long,shared
BootTime=2021-06-07T20:45:06 SlurmdStartTime=2021-06-07T20:43:27
CfgTRES=cpu=36,mem=1M,billing=36
AllocTRES=cpu=36,mem=1M,billing=36
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


Here's my sbatch script:

#!/bin/bash
#SBATCH --job-name=robotune
#SBATCH --nodes=36
#SBATCH --ntasks=36
#SBATCH --cpus-per-task=36
#SBATCH --time=5-12:00:00
#SBATCH --output="%x-%N-%j.out"

module purge
module load gnu8/8.3.0
module load mpich/3.3

srun --mpi=pmi2 /home/ptashko/work/robomarket/cmd/tune/robotune <ARGS>


And here's the CPU load of all nodes allocated for this command:

$ scontrol show nodes cn-[005-040] | egrep "CPULoad"
CPUAlloc=36 CPUTot=36 CPULoad=26.53
CPUAlloc=36 CPUTot=36 CPULoad=18.67
CPUAlloc=36 CPUTot=36 CPULoad=4.63
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.00
CPUAlloc=36 CPUTot=36 CPULoad=1.02
CPUAlloc=36 CPUTot=36 CPULoad=0.98
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=0.99
CPUAlloc=36 CPUTot=36 CPULoad=1.02
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=0.99
CPUAlloc=36 CPUTot=36 CPULoad=0.99
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=0.99
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=0.99
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01
CPUAlloc=36 CPUTot=36 CPULoad=1.01


And:

$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 11-00:00:0 37 alloc cn-[001,005-040]
normal* up 11-00:00:0 3 idle cn-[002-004]
long up 31-00:00:0 37 alloc cn-[001,005-040]
long up 31-00:00:0 3 idle cn-[002-004]
shared up infinite 26 alloc cn-[015-040]


So as you see almost all nodes finished calculations (CPULoad 1%).
Only three are working. But those who finished do not become idle!

I want finished nodes to become idle. What I am possibly doing wrong?

Thank you,
Grigory.

</pre>
    </blockquote>
  </body>
</html>