[slurm-users] nodes that finished calculation do not become idle

Brian Andrus toomuchit at gmail.com
Sun Jun 27 14:40:30 UTC 2021


I suspect you are misunderstanding how the flow works.

 1. You request X nodes to do some work.
 2. You start a job that uses all the nodes.
 3. Job runs until everything is done.
 4. Resources are released back to be used again.

If your job allows it, you probably want an array job, which will be 
such that each part can be run independently of the others. This allows 
the resources for that part to be released when that part is complete.

Bottom line:
Resources are not released when they are not being used, they are 
released when the job is done.

Brian Andrus

On 6/26/2021 11:59 PM, Grigory Ptashko wrote:
> Hello!
>
> Recently I've started using MPI on our HPC-cluster.
> It has 40 nodes.
> It runs SLURM.
> I'm new to MPI and SLURM but so far everything works fine except one thing.
> In short: nodes that finished calculation do not become idle.
> Only after all the nodes finished calculations they all become idle.
>
> Here's an example of a typical node:
>
> $ scontrol show nodes cn-022
> NodeName=cn-022 Arch=x86_64 CoresPerSocket=18
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=(null)
> NodeAddr=cn-022 NodeHostName=cn-022 Version=18.08
> OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018
> RealMemory=1 AllocMem=0 FreeMem=507942 Sockets=2 Boards=1
> State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
> Partitions=normal,long,shared
> BootTime=2021-06-07T20:45:06 SlurmdStartTime=2021-06-07T20:43:27
> CfgTRES=cpu=36,mem=1M,billing=36
> AllocTRES=cpu=36,mem=1M,billing=36
> CapWatts=n/a
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> Here's my sbatch script:
>
> #!/bin/bash
> #SBATCH --job-name=robotune
> #SBATCH --nodes=36
> #SBATCH --ntasks=36
> #SBATCH --cpus-per-task=36
> #SBATCH --time=5-12:00:00
> #SBATCH --output="%x-%N-%j.out"
>
> module purge
> module load gnu8/8.3.0
> module load mpich/3.3
>
> srun --mpi=pmi2 /home/ptashko/work/robomarket/cmd/tune/robotune <ARGS>
>
>
> And here's the CPU load of all nodes allocated for this command:
>
> $ scontrol show nodes cn-[005-040] | egrep "CPULoad"
> CPUAlloc=36 CPUTot=36 CPULoad=26.53
> CPUAlloc=36 CPUTot=36 CPULoad=18.67
> CPUAlloc=36 CPUTot=36 CPULoad=4.63
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.00
> CPUAlloc=36 CPUTot=36 CPULoad=1.02
> CPUAlloc=36 CPUTot=36 CPULoad=0.98
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=0.99
> CPUAlloc=36 CPUTot=36 CPULoad=1.02
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=0.99
> CPUAlloc=36 CPUTot=36 CPULoad=0.99
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=0.99
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=0.99
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>
>
> And:
>
> $ sinfo
> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
> normal* up 11-00:00:0 37 alloc cn-[001,005-040]
> normal* up 11-00:00:0 3 idle cn-[002-004]
> long up 31-00:00:0 37 alloc cn-[001,005-040]
> long up 31-00:00:0 3 idle cn-[002-004]
> shared up infinite 26 alloc cn-[015-040]
>
>
> So as you see almost all nodes finished calculations (CPULoad 1%).
> Only three are working. But those who finished do not become idle!
>
> I want finished nodes to become idle. What I am possibly doing wrong?
>
> Thank you,
> Grigory.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210627/cb0a3a6e/attachment.htm>


More information about the slurm-users mailing list