[slurm-users] Strange error, submission denied
Prentice Bisbal
pbisbal at pppl.gov
Wed Feb 20 15:08:48 UTC 2019
On 2/20/19 12:08 AM, Marcus Wagner wrote:
> Hi Prentice,
>
>
> On 2/19/19 2:58 PM, Prentice Bisbal wrote:
>>
>> --ntasks-per-node is meant to be used in conjunction with --nodes
>> option. From https://slurm.schedmd.com/sbatch.html:
>>
>>> *--ntasks-per-node*=</ntasks/>
>>> Request that /ntasks/ be invoked on each node. If used with the
>>> *--ntasks* option, the *--ntasks* option will take precedence
>>> and the *--ntasks-per-node* will be treated as a /maximum/ count
>>> of tasks per node. Meant to be used with the *--nodes* option...
>>>
> Yes, but used together with --ntasks would mean to use e.g. 48 tasks
> at maximum per node. I don't see, where there lies the difference
> regarding submission of the job. Even if the semantic (how or how
> many cores will be scheduled onto which number of hosts) might be
> incorrect, at least the syntax should be correct.
The difference would be in how Slurm looks at those specifications
internally. To us humans, what you say should work seems logical, but if
Slurm wasn't programmed to behave that way, it won't. I provided the
quote from the documentation, since that implies, to me at least, that
Slurm isn't programmed to behave like that. Looking at the source code
or asking SchedMD could confirm that.
>> If you don't specify --ntasks, it defaults to --ntasks=1, as Andreas
>> said. https://slurm.schedmd.com/sbatch.html:
>>
>>> *-n*, *--ntasks*=</number/>
>>> sbatch does not launch tasks, it requests an allocation of
>>> resources and submits a batch script. This option advises the
>>> Slurm controller that job steps run within the allocation will
>>> launch a maximum of /number/ tasks and to provide for sufficient
>>> resources. The default is one task per node, but note that the
>>> *--cpus-per-task* option will change this default.
>>>
>> So the correct way to specify your job is either like this
>>
>> --ntasks=48
>>
>> or
>>
>> --nodes=1 --ntasks-per-node=48
>>
>> Specifying both --ntasks-per-node and --ntasks at the same time is
>> not correct.
>>
> funnily the result is the same:
>
> $> sbatch -N 1 --ntasks-per-node=48 --wrap hostname
> sbatch: error: CPU count per node can not be satisfied
> sbatch: error: Batch job submission failed: Requested node
> configuration is not available
>
> whereas just using --ntasks=48 gets submitted and it gets scheduled
> onto one host:
>
> $> sbatch --ntasks=48 --wrap hostname
> sbatch: [I] No output file given, set to: output_%j.txt
> sbatch: [I] No runtime limit given, set to: 15 minutes
> Submitted batch job 199784
> $> scontrol show job 199784 | egrep "NumNodes|TRES"
> NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=48,mem=182400M,node=1,billing=48
>
> To me, this still looks like a bug, not like the wrong usage of
> submission parameters.
Either a bug, or there's something subtly wrong with your slurm.conf. I
would continue troubleshooting by simplifying both your node definition
and SelectType options as much as possible, and see if the problem still
persists. Also, look at 'scontrol show node <node name>' to see if your
definition in slurm.conf lines up with how Slurm actually sees the node.
I don't think I saw that output anywhere is this thread yet.
>
> Does no one else use nodes in this shared way?
> If nodes are shared, do you schedule by hardware threads or by cores?
> If you schedule by cores, how did you implement this in slurm?
>
>
> Best
> Marcus
>>
>>
>> Prentice
>> On 2/14/19 1:09 AM, Henkel, Andreas wrote:
>>> Hi Marcus,
>>>
>>> What just came to my mind: if you don’t set —ntasks isn’t the default just 1? All examples I know using ntasks-per-node also set ntasks with ntasks >= ntasks-per-node.
>>>
>>> Best,
>>> Andreas
>>>
>>>> Am 14.02.2019 um 06:33 schrieb Marcus Wagner<wagner at itc.rwth-aachen.de>:
>>>>
>>>> Hi all,
>>>>
>>>> I have narrowed this down a little bit.
>>>>
>>>> the really astonishing thing is, that if I use
>>>>
>>>> --ntasks=48
>>>>
>>>> I can submit the job, it will be scheduled onto one host:
>>>>
>>>> NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>>> TRES=cpu=48,mem=182400M,node=1,billing=48
>>>>
>>>> but as soon as I change --ntasks to --ntasks-per-node (which should be the same, as --ntasks=48 schedules onto one host), I get the error:
>>>>
>>>> sbatch: error: CPU count per node can not be satisfied
>>>> sbatch: error: Batch job submission failed: Requested node configuration is not available
>>>>
>>>>
>>>> Is there no one else, who observes this behaviour?
>>>> Any explanations?
>>>>
>>>>
>>>> Best
>>>> Marcus
>>>>
>>>>
>>>>> On 2/13/19 1:48 PM, Marcus Wagner wrote:
>>>>> Hi all,
>>>>>
>>>>> I have a strange behaviour here.
>>>>> We are using slurm 18.08.5-2 on CentOS 7.6.
>>>>>
>>>>> Let me first describe our computenodes:
>>>>> NodeName=ncm[0001-1032] CPUs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000 Feature=skx8160,hostok,hpcwork Weight=10541 State=UNKNOWN
>>>>>
>>>>> we have the following config set:
>>>>>
>>>>> $>scontrol show config | grep -i select
>>>>> SelectType = select/cons_res
>>>>> SelectTypeParameters = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE
>>>>>
>>>>>
>>>>> So, I have 48 cores on one node. According to the manpage of sbatch, I should be able to do the following:
>>>>>
>>>>> #SBATCH --ntasks=48
>>>>> #SBATCH --ntasks-per-node=48
>>>>>
>>>>> But I get the following error:
>>>>> sbatch: error: Batch job submission failed: Requested node configuration is not available
>>>>>
>>>>>
>>>>> Has anyone an explanation for this?
>>>>>
>>>>>
>>>>> Best
>>>>> Marcus
>>>>>
>>>> --
>>>> Marcus Wagner, Dipl.-Inf.
>>>>
>>>> IT Center
>>>> Abteilung: Systeme und Betrieb
>>>> RWTH Aachen University
>>>> Seffenter Weg 23
>>>> 52074 Aachen
>>>> Tel: +49 241 80-24383
>>>> Fax: +49 241 80-624383
>>>> wagner at itc.rwth-aachen.de
>>>> www.itc.rwth-aachen.de
>>>>
>>>>
>
> --
> Marcus Wagner, Dipl.-Inf.
>
> IT Center
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wagner at itc.rwth-aachen.de
> www.itc.rwth-aachen.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190220/437d649c/attachment.html>
More information about the slurm-users
mailing list