[slurm-users] GPUs not available after making use of all threads?

Mon Feb 13 08:15:38 UTC 2023

Hi Sebastian,

I am glad I could help (although not exactly as expected :-).

With your node-configuration you are "circumventing" how Slurm behaves, 
when using "CR_Core": if you read the respective part in

https://slurm.schedmd.com/slurm.conf.html

it says:

"CR_Core
   [...] On nodes with hyper-threads, each thread is counted as a CPU to 
satisfy a job's resource requirement, but multiple jobs are not 
allocated threads on the same core."

That's why you got a full core (both threads) when allocating a singe 
CPU. Or e.g. four threads when allocating three CPUs asf.

"Lying" to Slurm about the actual hardware-setup helps to avoid this 
behaviour but are you really confident with potentially running two 
different jobs on the hyper-threads of the same core?

Regards,
Hermann

On 2/12/23 22:04, Sebastian Schmutzhard-Höfler wrote:
> Hi Hermann,
> 
> Using your suggested settings did not work for us.
> 
> When trying to allocate a single thread with --cpus-per-task=1, it still 
> reserved a whole CPU (two threads). On the other hand, when requesting 
> an even number of threads, it does what it should.
> 
> However, I could make it work by using
> 
> SelectTypeParameters=CR_Core
> NodeName=nodename Sockets=2 CoresPerSocket=128 ThreadsPerCore=1
> 
> instead of
> 
> SelectTypeParameters=CR_Core
> NodeName=nodename Sockets=2 CoresPerSocket=64 ThreadsPerCore=2
> 
> So your suggestion brought me in the right direction. Thanks!
> 
> If anyone thinks this is complete nonsense, please let me know!
> 
> Best wishes,
> 
> Sebastian
> 
> On 11.02.23 11:13, Hermann Schwärzler wrote:
>> Hi Sebastian,
>>
>> we did a similar thing just recently.
>>
>> We changed our node settings from
>>
>> NodeName=DEFAULT CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 
>> ThreadsPerCore=2
>>
>> to
>>
>> NodeName=DEFAULT Boards=1 SocketsPerBoard=2 CoresPerSocket=32 
>> ThreadsPerCore=2
>>
>> in order to make use of individual hyper-threads possible (we use this 
>> in combination with
>> SelectTypeParameters=CR_Core_Memory).
>>
>> This works as expected: after this, when e.g. asking for 
>> --cpus-per-task=4 you will get 4 hyper-threads (2 cores) per task 
>> (unless you also specify e.g. "--hint=nomultithread").
>>
>> So you might try to remove the "CPUs=256" part of your 
>> node-specification to let Slurm do that calculation of the number of 
>> CPUs itself.
>>
>>
>> BTW: on a side-note: as most of our users do not bother to use 
>> hyper-threads or even do not want to as their programs might suffer 
>> from doing so, we made "--hint=nomultithread" the default in our 
>> installation by adding
>>
>> CliFilterPlugins=cli_filter/lua
>>
>> to our slurm.conf and creating a cli_filter.lua file in the same 
>> directory as slurm.conf, that contains this
>>
>> function slurm_cli_setup_defaults(options, early_pass)
>>         options['hint'] = 'nomultithread'
>>
>>         return slurm.SUCCESS
>> end
>>
>> (see also 
>> https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example).
>> So if user really want to use hyper-threads they have to add 
>> "--hint=multithread" to their job/allocation-options.
>>
>> Regards,
>> Hermann
>>
>> On 2/10/23 00:31, Sebastian Schmutzhard-Höfler wrote:
>>> Dear all,
>>>
>>> we have a node with 2 x 64 CPUs (with two threads each) and 8 GPUs, 
>>> running slurm 22.05.5
>>>
>>> In order to make use of individual threads, we changed|
>>> |
>>>
>>> |SelectTypeParameters=CR_Core||
>>> NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64 
>>> ThreadsPerCore=2 |
>>>
>>> to
>>>
>>> |SelectTypeParameters=CR_CPU NodeName=nodename CPUs=256|
>>>
>>> We are now able to allocate individual threads to jobs, despite the 
>>> following error in slurmd.log:
>>>
>>> error: Node configuration differs from hardware: CPUs=256:256(hw) 
>>> Boards=1:1(hw) SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw) 
>>> ThreadsPerCore=1:2(hw)
>>>
>>>
>>> However, it appears that since this change, we can only make use of 4 
>>> out of the 8 GPUs.
>>> The output of "sinfo -o %G" might be relevant.
>>>
>>> In the first situation it was
>>>
>>> $ sinfo -o %G
>>> GRES
>>> gpu:A100:8(S:0,1)
>>>
>>> Now it is:
>>>
>>> $ sinfo -o %G
>>> GRES
>>> gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126) 
>>>
>>>
>>> ||Has anyone faced this or a similar issue and can give me some 
>>> directions?
>>> Best wishes
>>>
>>> Sebastian
>>>
>>> ||
>>>
>>
>