[slurm-users] sbatch overallocation
Max Quast
max at quast.de
Sat Oct 10 11:05:51 UTC 2020
Dear slurm-users,
I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm
20.02.5):
# COMPUTE NODES
GresTypes=gpu
NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64
RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
PartitionName=admin Nodes=lsm[216-217] Default=YES
MaxTime=INFINITE State=UP
The slurmctl is running on a separate Ubuntu system where no slurmd is
installed.
If a user executes this script (sbatch srun2.bash)
#!/bin/bash
#SBATCH -N 2 -n9
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17
-parallel > /dev/null &
wait
8 jobs with 9 threads are launched and distributed on two nodes.
If more such scripts get started at the same time, all the srun commands
will be executed even though no free cores are available. So the nodes are
overallocated.
How can this be prevented?
Thx :)
Greetings
max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201010/f731b92d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5014 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201010/f731b92d/attachment.bin>
More information about the slurm-users
mailing list