Hi Chris,

Christopher Benjamin Coffey <Chris.Coffey at nau.edu> writes:

> Hi Loris,
>> But that's only the case if the program is started with srun or some
>> form of mpirun.  Otherwise the program just gets started once on one
>> core and the other cores just idle.
> Yes, maybe that’s true about what you say when not using srun. I'm not
> sure, as we tell everyone to use srun to launch every type of task.

OK, I'm confused now.  Our main culprit for producing processes with
incorrect affinity is ORCA [1].  It uses OpenMPI but also likes to start
processes asynchronously via SSH within the node set.  Our users run
their jobs via batch files containing, say

  #SBATCH --ntasks=8
  $ORCA_PATH/orca ...

However, if I run an ORCA job with 'srun', i.e.

  #SBATCH --ntasks=8
  srun $ORCA_PATH/orca ...

this results in the program being run 8 times with all of them writing
to the same log and output files.

Is ORCA just a pathological exception to the idea that it's always good
to use 'srun'?  (As it causes well over 95% of our affinity problems, it
is already pathological in that sense.)



[1]  https://orcaforum.cec.mpg.de/

