[slurm-users] Strange error, submission denied
Marcus Wagner
wagner at itc.rwth-aachen.de
Wed Feb 20 05:08:38 UTC 2019
Hi Prentice,
On 2/19/19 2:58 PM, Prentice Bisbal wrote:
>
> --ntasks-per-node is meant to be used in conjunction with --nodes
> option. From https://slurm.schedmd.com/sbatch.html:
>
>> *--ntasks-per-node*=</ntasks/>
>> Request that /ntasks/ be invoked on each node. If used with the
>> *--ntasks* option, the *--ntasks* option will take precedence and
>> the *--ntasks-per-node* will be treated as a /maximum/ count of
>> tasks per node. Meant to be used with the *--nodes* option...
>>
Yes, but used together with --ntasks would mean to use e.g. 48 tasks at
maximum per node. I don't see, where there lies the difference regarding
submission of the job. Even if the semantic (how or how many cores will
be scheduled onto which number of hosts) might be incorrect, at least
the syntax should be correct.
>>
> If you don't specify --ntasks, it defaults to --ntasks=1, as Andreas
> said. https://slurm.schedmd.com/sbatch.html:
>
>> *-n*, *--ntasks*=</number/>
>> sbatch does not launch tasks, it requests an allocation of
>> resources and submits a batch script. This option advises the
>> Slurm controller that job steps run within the allocation will
>> launch a maximum of /number/ tasks and to provide for sufficient
>> resources. The default is one task per node, but note that the
>> *--cpus-per-task* option will change this default.
>>
> So the correct way to specify your job is either like this
>
> --ntasks=48
>
> or
>
> --nodes=1 --ntasks-per-node=48
>
> Specifying both --ntasks-per-node and --ntasks at the same time is not
> correct.
>
funnily the result is the same:
$> sbatch -N 1 --ntasks-per-node=48 --wrap hostname
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration
is not available
whereas just using --ntasks=48 gets submitted and it gets scheduled onto
one host:
$> sbatch --ntasks=48 --wrap hostname
sbatch: [I] No output file given, set to: output_%j.txt
sbatch: [I] No runtime limit given, set to: 15 minutes
Submitted batch job 199784
$> scontrol show job 199784 | egrep "NumNodes|TRES"
NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=48,mem=182400M,node=1,billing=48
To me, this still looks like a bug, not like the wrong usage of
submission parameters.
Does no one else use nodes in this shared way?
If nodes are shared, do you schedule by hardware threads or by cores?
If you schedule by cores, how did you implement this in slurm?
Best
Marcus
>
>
> Prentice
> On 2/14/19 1:09 AM, Henkel, Andreas wrote:
>> Hi Marcus,
>>
>> What just came to my mind: if you don’t set —ntasks isn’t the default just 1? All examples I know using ntasks-per-node also set ntasks with ntasks >= ntasks-per-node.
>>
>> Best,
>> Andreas
>>
>>> Am 14.02.2019 um 06:33 schrieb Marcus Wagner<wagner at itc.rwth-aachen.de>:
>>>
>>> Hi all,
>>>
>>> I have narrowed this down a little bit.
>>>
>>> the really astonishing thing is, that if I use
>>>
>>> --ntasks=48
>>>
>>> I can submit the job, it will be scheduled onto one host:
>>>
>>> NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>> TRES=cpu=48,mem=182400M,node=1,billing=48
>>>
>>> but as soon as I change --ntasks to --ntasks-per-node (which should be the same, as --ntasks=48 schedules onto one host), I get the error:
>>>
>>> sbatch: error: CPU count per node can not be satisfied
>>> sbatch: error: Batch job submission failed: Requested node configuration is not available
>>>
>>>
>>> Is there no one else, who observes this behaviour?
>>> Any explanations?
>>>
>>>
>>> Best
>>> Marcus
>>>
>>>
>>>> On 2/13/19 1:48 PM, Marcus Wagner wrote:
>>>> Hi all,
>>>>
>>>> I have a strange behaviour here.
>>>> We are using slurm 18.08.5-2 on CentOS 7.6.
>>>>
>>>> Let me first describe our computenodes:
>>>> NodeName=ncm[0001-1032] CPUs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000 Feature=skx8160,hostok,hpcwork Weight=10541 State=UNKNOWN
>>>>
>>>> we have the following config set:
>>>>
>>>> $>scontrol show config | grep -i select
>>>> SelectType = select/cons_res
>>>> SelectTypeParameters = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE
>>>>
>>>>
>>>> So, I have 48 cores on one node. According to the manpage of sbatch, I should be able to do the following:
>>>>
>>>> #SBATCH --ntasks=48
>>>> #SBATCH --ntasks-per-node=48
>>>>
>>>> But I get the following error:
>>>> sbatch: error: Batch job submission failed: Requested node configuration is not available
>>>>
>>>>
>>>> Has anyone an explanation for this?
>>>>
>>>>
>>>> Best
>>>> Marcus
>>>>
>>> --
>>> Marcus Wagner, Dipl.-Inf.
>>>
>>> IT Center
>>> Abteilung: Systeme und Betrieb
>>> RWTH Aachen University
>>> Seffenter Weg 23
>>> 52074 Aachen
>>> Tel: +49 241 80-24383
>>> Fax: +49 241 80-624383
>>> wagner at itc.rwth-aachen.de
>>> www.itc.rwth-aachen.de
>>>
>>>
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190220/73431b61/attachment-0001.html>
More information about the slurm-users
mailing list