[slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

Marcus Wagner wagner at itc.rwth-aachen.de
Fri Mar 6 08:13:24 UTC 2020


Hi Alexander,


glad to help.

Best
Marcus

On 3/5/20 4:47 PM, Alexander Grund wrote:
> Hi Marcus,
>
> see below for the request info
>
>> scontrol show config | grep SelectTypeParameters
> SelectTypeParameters    = 
> CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK,CR_PACK_NODES
>>
>> But I would first like to see, what
>>
>> sbatch -vvv jobscript
>>
>> outputs first.
> salloc: defined options
> salloc: -------------------- --------------------
> salloc: account             : zihforschung
> salloc: cpus-per-task       : 50
> salloc: hint                : nomultithread
> salloc: mem-per-cpu         : 100
> salloc: ntasks              : 1
> salloc: partition           : ml
> salloc: time                : 00:10:00
> salloc: verbose             : 3
> salloc: -------------------- --------------------
> salloc: end of defined options
> salloc: debug2: spank: spank_cloud.so: init_post_opt = 0
> salloc: debug2: spank: spank_beegfs.so: init_post_opt = 0
> salloc: debug2: spank: spank_nv_gpufreq.so: init_post_opt = 0
> salloc: debug:  Entering slurm_allocation_msg_thr_create()
> salloc: debug:  port from net_stream_listen is 36988
> salloc: debug:  Entering _msg_thr_internal
> salloc: debug:  Munge authentication plugin loaded
> salloc: select/cons_tres loaded with argument 4884
> salloc: Cray/Aries node selection plugin loaded
> salloc: Consumable Resources (CR) Node Selection plugin loaded with 
> argument 4884
> salloc: Linear node selection plugin loaded with argument 4884
> salloc: debug2: eio_message_socket_accept: got message connection from 
> 10.1.129.243:49746 8
> salloc: error: Job submit/allocate failed: Requested node 
> configuration is not available
> salloc: debug2: slurm_allocation_msg_thr_destroy: clearing up message 
> thread
> salloc: Job allocation 18847818 has been revoked.
> salloc: debug2:   false, shutdown
> salloc: debug:  Leaving _msg_thr_internal
> salloc: debug2: spank: spank_cloud.so: exit = 0
> salloc: debug2: spank: spank_nv_gpufreq.so: exit = 0
>
>
> So good idea, seems someone defined "SLURM_HINT=nomultithread" in all 
> users env. Removing that makes the allocation succeed.
>

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list