[slurm-users] sbatch overallocation

Max Quast max at quast.de
Sat Oct 10 11:05:51 UTC 2020


Dear slurm-users, 

 

I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm
20.02.5):

 

                # COMPUTE NODES

                GresTypes=gpu

                NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64
RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN

                PartitionName=admin Nodes=lsm[216-217] Default=YES
MaxTime=INFINITE State=UP

 

The slurmctl is running on a separate Ubuntu system where no slurmd is
installed.

 

If a user executes this script (sbatch srun2.bash)

 

                #!/bin/bash

                #SBATCH -N 2 -n9

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17
-parallel > /dev/null &

                wait

 

8 jobs with 9 threads are launched and distributed on two nodes.

 

If more such scripts get started at the same time, all the srun commands
will be executed even though no free cores are available. So the nodes are
overallocated.

How can this be prevented?

 

Thx :)

 

Greetings 

max

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201010/f731b92d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5014 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201010/f731b92d/attachment.bin>


More information about the slurm-users mailing list