Hi everybody,
I am (along with others) a little bit puzzled by the meaning of a statement in the documentation concerning heterogeneous job steps inside het. jobs. The docs state (https://slurm.schedmd.com/archive/slurm-24.11.5/heterogeneous_jobs.html#het_...):
You also cannot request heterogeneous steps from within a heterogeneous job. (A)
On a very small Slurm test installation with just two nodes, the following het job that requests het steps (does it, right?!) runs fine:
$ cat hetjob-steps.sh #!/bin/bash
#SBATCH --mem-per-cpu=2g --nodes=1 --cpus-per-task=8 #SBATCH hetjob #SBATCH --mem-per-cpu=1g --nodes=1 --cpus-per-task=4
srun -l --cpus-per-task=4 nproc : -l --cpus-per-task=2 nproc
$ cat slurm-125.out 1: 4 2: 2 3: 2 0: 4
The output looks reasonable and it looks like the above quote does not apply since one can apparently request het steps in a het job. Or am I wrong?
The intro in the respective section also gives the impression that het jobsteps are a convenience feature that does not require het jobs, but it does not explicitly exclude the usage of het steps in het jobs:
Slurm version 20.11 introduces the ability to request heterogeneous job steps from within a non-homogeneous job allocation. This allows you the flexibility to have different layouts for job steps without requiring the use of heterogeneous jobs, where having separate jobs for the components may be undesirable.
So what does the initial statement (A) actually mean then? Am I just using a lucky example which is actually not supported?
A short clarification would be helpful.
Thanks in advance
Steffen