[slurm-users] Strange error, submission denied

Henkel henkel at uni-mainz.de
Wed Feb 20 10:59:46 UTC 2019

Hi Chris,
Hi Marcus,

Just want to understand the cause, too. I'll try to sum it up.

Chris you have

CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2


srun -C gpu -N 1 --ntasks-per-node=80 hostname


Marcus has configured

CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2
(slurmd -C says CPUs=96 Boards=1 SocketsPerBoard=4 CoresPerSocket=12




srun -n 48 WORKS

srun -N 1 --ntasks-per-node=48 DOESN'T WORK.

I'm not sure if it's caused by CR_ONE_TASK_PER_CORE but at least that's
one of the major differences. I'm wondering if the effort to force using
only physical cores is doubled by removing the 48 Threads AND setting
CR_ONE_TAKS_PER_CORE. My impression is that with CR_ONE_TASK_PER_CORE
ntasks-per-node accounts for threads (you have set ThreadsPerCore=2),
hence only 24 may work but CR_ONE_TASK_PER_CORE doen't affect the
selection of 'cores only' with ntasks.

We don't use CR_ONE_TASK_PER_CORE but our users either set -c 2 or
--hint=nomultithread, which results in core-only.

You could also enforce this with a job-submit-plugin or lua-plugin.

The fact that CR_ONE_TASK_PER_CORE is described as "under changed" in
the public bugs and that there is a non-accessible bug about this
probably points to better not use this unless you have to.



On 2/20/19 7:49 AM, Chris Samuel wrote:
> On Tuesday, 19 February 2019 10:14:21 PM PST Marcus Wagner wrote:
>> sbatch -N 1 --ntasks-per-node=48 --wrap hostname
>> submission denied, got jobid 199805
> On one of our 40 core nodes with 2 hyperthreads:
> $ srun -C gpu -N 1 --ntasks-per-node=80 hostname | uniq -c
>      80 nodename02
> The spec is:
> CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2
> Hope this helps!
> All the best,
> Chris

Dr. Andreas Henkel
Operativer Leiter HPC
Zentrum für Datenverarbeitung
Johannes Gutenberg Universität
Anselm-Franz-von-Bentzelweg 12
55099 Mainz
Telefon: +49 6131 39 26434
OpenPGP Fingerprint: FEC6 287B EFF3
7998 A141 03BA E2A9 089F 2D8E F37E

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0xE2A9089F2D8EF37E.asc
Type: application/pgp-keys
Size: 3143 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190220/8e157585/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190220/8e157585/attachment.sig>

More information about the slurm-users mailing list