[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Marcus Wagner wagner at itc.rwth-aachen.de
Mon Jan 13 06:37:43 UTC 2020


Hi Beatrice,

we are also still on 18.08.7. But we have a similar problem here with 
the billing, which is much too high (cmp. "[slurm-users] exclusive or 
not exclusive, that is the question"). But Slurm > 18.08.7 exacerbates 
the problem, as those jobs don't even get scheduled :/

Best
Marcus


On 1/10/20 4:58 PM, Beatrice Charton wrote:
> Hi,
>
> Happy new Year ;-)
>
> I just update Slurm to 18.08.9 : same behaviour. Jobs still stay PD for ever instead of being refused :-(
> Am I the only one in this situation ?
>
> Sincerely,
>
> 	Béatrice
>
>
>> Le 16 déc. 2019 à 09:49, Beatrice Charton <Beatrice.Charton at criann.fr> a écrit :
>>
>> Hi Marcus and Bjørn-Helge
>>
>> Thank you for your answers.
>>
>> We don’t use slurm billing. We use system acct billing.
>> I also confirm that with --exclusive, there is a difference between ReqCPUS and AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a --mem-per-cpu : it was associated to ReqCPUS. It looks like now it is associated to AllocCPUS.
>>
>> If it’s not a side effect, why do jobs and not rejected instead of accepted and Pending for ever ?
>> The behaviour is the same in 19.05.2 but recorrected in 19.05.3 so the problem seems to be known in v19 but not corrected in v18.
>>
>> Sincerely,
>>
>> 	Béatrice
>>
>>> Le 12 déc. 2019 à 12:10, Marcus Wagner <wagner at itc.rwth-aachen.de> a écrit :
>>>
>>> Hi Beatrice and Bjørn-Helge,
>>>
>>> I can sign, that it works with 18.08.7. We additionally use TRESBillingWeights together with PriorityFlags=MAX_TRES. For example:
>>> TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0"
>>> We use the billing factor for our external accounting. We do this to do a fair accounting of the nodes. But we do have a similar effect due to --exclusive.
>>> In Beatrice case, the billingweight would be:
>>> TRESBillingWeights="CPU=1.0,Mem=0.21875G"
>>> So, a 10 cpu job with 1 GB per cpu would be billed 10.
>>> An 1 cpu job with 10 GB would be billed 2 (0.21875*10, floor).
>>> An exclusive 10 cpu job with 1 GB per cpu would be billed 28 (all 28 cores are for the job).
>>> An exclusive 1 cpu job with 30GB (Beatrice' example) would be billed 28(cores)*30(GB)*0.21875 => 118.125 => 118 cores.
>>>
>>> Best
>>> Marcus
>>>
>>> On 12/12/19 9:47 AM, Bjørn-Helge Mevik wrote:
>>>> Beatrice Charton <beatrice.charton at criann.fr> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a strange behaviour of Slurm after updating from 18.08.7 to
>>>>> 18.08.8, for jobs using --exclusive and --mem-per-cpu.
>>>>>
>>>>> Our nodes have 128GB of memory, 28 cores.
>>>>> 	$ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
>>>>> => works in 18.08.7
>>>>> => doesn’t work in 18.08.8
>>>> I'm actually surprised it _worked_ in 18.08.7.  At one time - long before
>>>> v 18.08, the behaviour was changed when using --exclusive: In order to
>>>> account the job for all cpus on the node, the number of
>>>> cpus asked for with --ntasks would simply be multiplied with with
>>>> "#cpus-on-node / --ntasks" (so in your case: 28).  Unfortunately, that
>>>> also means that the memory the job requires per node is "#cpus-on-node /
>>>> --ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~=
>>>> 820 GiB).  For this reason, we tend to ban --exclusive on our clusters
>>>> (or at least warn about it).
>>>>
>>>> I haven't looked at the code for a long time, so I don't know whether
>>>> this is still the current behaviour, but every time I've tested, I've
>>>> seen the same problem.  I believe I've tested on 19.05 (but I might
>>>> remember wrong).
>>>>
>>> -- 
>>> Marcus Wagner, Dipl.-Inf.
>>>
>>> IT Center
>>> Abteilung: Systeme und Betrieb
>>> RWTH Aachen University
>>> Seffenter Weg 23
>>> 52074 Aachen
>>> Tel: +49 241 80-24383
>>> Fax: +49 241 80-624383
>>> wagner at itc.rwth-aachen.de
>>> www.itc.rwth-aachen.de
>>>
>>>
>> -- 
>> Béatrice CHARTON		|              CRIANN
>> Beatrice.Charton at criann.fr	|  745, avenue de l'Université
>> Tel : +33 (0)2 32 91 42 91 	| 76800 Saint Etienne du Rouvray
>>        ---   Support : support at criann.fr   ---
>>

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list