[slurm-users] srun jobfarming hassle question

Ohlerich, Martin Martin.Ohlerich at lrz.de
Wed Jan 18 14:22:27 UTC 2023


Sure ;) My example was just for fast reproductivity.

The complete job farm script is (if that's of interest):

----------------------------------------->

#!/bin/bash
#SBATCH -J jobfarm_test
#SBATCH -o log.%x.%j.%N.out
#SBATCH -D ./
#SBATCH --mail-type=NONE
#SBATCH --time=00:05:00
#SBATCH --export=NONE
#SBATCH --get-user-env
#SBATCH --clusters=...
#SBATCH --partition=..
#SBATCH --qos=lrz_admin
#SBATCH --nodes=2
#SBATCH --ntasks=4               # needed for SLURM_NTASKS on parallel below

module load slurm_setup    # LRZ specific
module load parallel             # GNU parallel

# Hyperthreading
export OMP_NUM_THREADS=28
export MY_SLURM_PARAMS="-N 1 -n 1 -c 28 --threads-per-core=2 --mem=27G --exact --export=ALL --cpu_bind=verbose,cores --mpi=none"

## Not Hyperthreading
#export OMP_NUM_THREADS=14
#export MY_SLURM_PARAMS="-N 1 -n 1 -c 28 --threads-per-core=2 --mem=27G --exact --export=ALL --cpu_bind=verbose,cores --mpi=none"

export MYEXEC=/lrz/sys/tools/placement_test_2021/bin/placement-test.omp_only
export PARAMS=("-d 20" "-d 10" "-d 20" "-d 10" "-d 20" "-d 10" "-d 20" "-d 10")

task() {
   echo "srun $MY_SLURM_PARAMS $MYEXEC $2 &> log2.$1"
   srun $MY_SLURM_PARAMS $MYEXEC $2 &> log2.$1
}
export -f task

parallel -P $SLURM_NTASKS task {#} {} ::: "${PARAMS[@]}"
----------------------------------------->

The good thing here is that users may only need to modify the SBATCH header and the exported environment variables.


But Magnus (Thanks for the Link!) is right. This is still far away from a feature rich job- or task-farming concept, where at least some overview of the passed/failed/missing task statistics is available etc.

But as a solution for about some dozens of tasks, the above is imho a feasible and flexible solution (as long as srun keeps playing by the rules).


For 1000 tasks, sure, something else is needed. I played with julia's pmap (https://doku.lrz.de/display/PUBLIC/FAQ%3A+Julia+on+SuperMUC-NG+and+HPC+Systems#FAQ:JuliaonSuperMUCNGandHPCSystems-MoreExamplesandExampleUseCases) what however also reacted quite negative on the srun-interface changes. So, I was a bit diverting from it again. Maybe too easily :scratch_head:

Anyway, Magnus, I will try it.


Huge thanks you all!

Kind regards,

Martin





________________________________
Von: slurm-users <slurm-users-bounces at lists.schedmd.com> im Auftrag von Ward Poelmans <ward.poelmans at vub.be>
Gesendet: Mittwoch, 18. Januar 2023 15:00
An: slurm-users at lists.schedmd.com
Betreff: Re: [slurm-users] srun jobfarming hassle question

Hi Martin,

Just a tip: use gnu parallel instead of a for loop. Much easier and more powerful.

Like:

parallel -j $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact <command> ::: *.input


Ward
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230118/aef80c6b/attachment.htm>


More information about the slurm-users mailing list