[slurm-users] nodes that finished calculation do not become idle

Grigory Ptashko grigory.ptashko at gmail.com
Fri Jul 2 15:28:12 UTC 2021


Job array is working like magic for me.
Thank you very much for the hint!

> 27 июня 2021 г., в 17:40, Brian Andrus <toomuchit at gmail.com> написал(а):
> 
> I suspect you are misunderstanding how the flow works.
> 
> You request X nodes to do some work.
> You start a job that uses all the nodes.
> Job runs until everything is done.
> Resources are released back to be used again.
> If your job allows it, you probably want an array job, which will be such that each part can be run independently of the others. This allows the resources for that part to be released when that part is complete.
> 
> Bottom line:
> Resources are not released when they are not being used, they are released when the job is done.
> 
> Brian Andrus
> 
> On 6/26/2021 11:59 PM, Grigory Ptashko wrote:
>> Hello!
>> 
>> Recently I've started using MPI on our HPC-cluster.
>> It has 40 nodes.
>> It runs SLURM.
>> I'm new to MPI and SLURM but so far everything works fine except one thing.
>> In short: nodes that finished calculation do not become idle.
>> Only after all the nodes finished calculations they all become idle.
>> 
>> Here's an example of a typical node:
>> 
>> $ scontrol show nodes cn-022
>> NodeName=cn-022 Arch=x86_64 CoresPerSocket=18
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> AvailableFeatures=(null)
>> ActiveFeatures=(null)
>> Gres=(null)
>> NodeAddr=cn-022 NodeHostName=cn-022 Version=18.08
>> OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018
>> RealMemory=1 AllocMem=0 FreeMem=507942 Sockets=2 Boards=1
>> State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>> Partitions=normal,long,shared
>> BootTime=2021-06-07T20:45:06 SlurmdStartTime=2021-06-07T20:43:27
>> CfgTRES=cpu=36,mem=1M,billing=36
>> AllocTRES=cpu=36,mem=1M,billing=36
>> CapWatts=n/a
>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>> 
>> 
>> Here's my sbatch script:
>> 
>> #!/bin/bash
>> #SBATCH --job-name=robotune
>> #SBATCH --nodes=36
>> #SBATCH --ntasks=36
>> #SBATCH --cpus-per-task=36
>> #SBATCH --time=5-12:00:00
>> #SBATCH --output="%x-%N-%j.out"
>> 
>> module purge
>> module load gnu8/8.3.0
>> module load mpich/3.3
>> 
>> srun --mpi=pmi2 /home/ptashko/work/robomarket/cmd/tune/robotune <ARGS>
>> 
>> 
>> And here's the CPU load of all nodes allocated for this command:
>> 
>> $ scontrol show nodes cn-[005-040] | egrep "CPULoad"
>> CPUAlloc=36 CPUTot=36 CPULoad=26.53
>> CPUAlloc=36 CPUTot=36 CPULoad=18.67
>> CPUAlloc=36 CPUTot=36 CPULoad=4.63
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.00
>> CPUAlloc=36 CPUTot=36 CPULoad=1.02
>> CPUAlloc=36 CPUTot=36 CPULoad=0.98
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=0.99
>> CPUAlloc=36 CPUTot=36 CPULoad=1.02
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=0.99
>> CPUAlloc=36 CPUTot=36 CPULoad=0.99
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=0.99
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=0.99
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> CPUAlloc=36 CPUTot=36 CPULoad=1.01
>> 
>> 
>> And:
>> 
>> $ sinfo
>> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
>> normal* up 11-00:00:0 37 alloc cn-[001,005-040]
>> normal* up 11-00:00:0 3 idle cn-[002-004]
>> long up 31-00:00:0 37 alloc cn-[001,005-040]
>> long up 31-00:00:0 3 idle cn-[002-004]
>> shared up infinite 26 alloc cn-[015-040]
>> 
>> 
>> So as you see almost all nodes finished calculations (CPULoad 1%).
>> Only three are working. But those who finished do not become idle!
>> 
>> I want finished nodes to become idle. What I am possibly doing wrong?
>> 
>> Thank you,
>> Grigory.
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210702/ffecca87/attachment-0001.htm>


More information about the slurm-users mailing list