[slurm-users] sbatch overallocation

Sat Oct 10 13:02:33 UTC 2020

Hi;

You can submit each pimplefoam as a seperate job. or if you realy submit 
as a single job, you can use a program to run each of them as much as 
cpu count such as gnu parallel:

https://www.gnu.org/software/parallel/

regards;

Ahmet M.

10.10.2020 14:05 tarihinde Max Quast yazdı:
>
> Dear slurm-users,
>
> I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm 
> 20.02.5):
>
>                 # COMPUTE NODES
>
> GresTypes=gpu
>
> NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64 RealMemory=192073 
> Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
>
> PartitionName=admin Nodes=lsm[216-217] Default=YES MaxTime=INFINITE 
> State=UP
>
> The slurmctl is running on a separate Ubuntu system where no slurmd is 
> installed.
>
> If a user executes this script (sbatch srun2.bash)
>
> #!/bin/bash
>
>                 #SBATCH -N 2 -n9
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-10 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-11 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-12 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-13 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-14 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-15 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-16 -parallel > /dev/null &
>
>                 srun pimpleFoam -case 
> /mnt/NFS/users/quast/channel395-17 -parallel > /dev/null &
>
>                 wait
>
> 8 jobs with 9 threads are launched and distributed on two nodes.
>
> If more such scripts get started at the same time, all the srun 
> commands will be executed even though no free cores are available. So 
> the nodes are overallocated.
>
> How can this be prevented?
>
> Thx :)
>
> Greetings
>
> max
>