[slurm-users] ntasks and cpus-per-task

Loris Bennett loris.bennett at fu-berlin.de
Fri Feb 23 03:50:14 MST 2018


Hi Chris,

Christopher Benjamin Coffey <Chris.Coffey at nau.edu> writes:

> Hi Loris,
>
>> But that's only the case if the program is started with srun or some
>> form of mpirun.  Otherwise the program just gets started once on one
>> core and the other cores just idle.
>
> Yes, maybe that’s true about what you say when not using srun. I'm not
> sure, as we tell everyone to use srun to launch every type of task.

OK, I'm confused now.  Our main culprit for producing processes with
incorrect affinity is ORCA [1].  It uses OpenMPI but also likes to start
processes asynchronously via SSH within the node set.  Our users run
their jobs via batch files containing, say

  #SBATCH --ntasks=8
  ...
  $ORCA_PATH/orca ...

However, if I run an ORCA job with 'srun', i.e.

  #SBATCH --ntasks=8
  ...
  srun $ORCA_PATH/orca ...

this results in the program being run 8 times with all of them writing
to the same log and output files.

Is ORCA just a pathological exception to the idea that it's always good
to use 'srun'?  (As it causes well over 95% of our affinity problems, it
is already pathological in that sense.)

Cheers,

Loris

Footnotes: 
[1]  https://orcaforum.cec.mpg.de/

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list