[slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used
Marcus Wagner
wagner at itc.rwth-aachen.de
Fri Mar 6 08:13:24 UTC 2020
Hi Alexander,
glad to help.
Best
Marcus
On 3/5/20 4:47 PM, Alexander Grund wrote:
> Hi Marcus,
>
> see below for the request info
>
>> scontrol show config | grep SelectTypeParameters
> SelectTypeParameters =
> CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK,CR_PACK_NODES
>>
>> But I would first like to see, what
>>
>> sbatch -vvv jobscript
>>
>> outputs first.
> salloc: defined options
> salloc: -------------------- --------------------
> salloc: account : zihforschung
> salloc: cpus-per-task : 50
> salloc: hint : nomultithread
> salloc: mem-per-cpu : 100
> salloc: ntasks : 1
> salloc: partition : ml
> salloc: time : 00:10:00
> salloc: verbose : 3
> salloc: -------------------- --------------------
> salloc: end of defined options
> salloc: debug2: spank: spank_cloud.so: init_post_opt = 0
> salloc: debug2: spank: spank_beegfs.so: init_post_opt = 0
> salloc: debug2: spank: spank_nv_gpufreq.so: init_post_opt = 0
> salloc: debug: Entering slurm_allocation_msg_thr_create()
> salloc: debug: port from net_stream_listen is 36988
> salloc: debug: Entering _msg_thr_internal
> salloc: debug: Munge authentication plugin loaded
> salloc: select/cons_tres loaded with argument 4884
> salloc: Cray/Aries node selection plugin loaded
> salloc: Consumable Resources (CR) Node Selection plugin loaded with
> argument 4884
> salloc: Linear node selection plugin loaded with argument 4884
> salloc: debug2: eio_message_socket_accept: got message connection from
> 10.1.129.243:49746 8
> salloc: error: Job submit/allocate failed: Requested node
> configuration is not available
> salloc: debug2: slurm_allocation_msg_thr_destroy: clearing up message
> thread
> salloc: Job allocation 18847818 has been revoked.
> salloc: debug2: false, shutdown
> salloc: debug: Leaving _msg_thr_internal
> salloc: debug2: spank: spank_cloud.so: exit = 0
> salloc: debug2: spank: spank_nv_gpufreq.so: exit = 0
>
>
> So good idea, seems someone defined "SLURM_HINT=nomultithread" in all
> users env. Removing that makes the allocation succeed.
>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
More information about the slurm-users
mailing list