[slurm-users] Job array start time and SchedNodes
Loris Bennett
loris.bennett at fu-berlin.de
Thu Dec 9 11:04:16 UTC 2021
Dear Thekla,
Yes, I think you are right. I have found a similar job on my system and
this does seem to be the normal, slightly confusing behaviour. It looks
as if the pending elements of the array get assigned a single node,
but then start on other nodes:
$ squeue -j 8536946 -O jobid,jobarrayid,reason,schednodes,nodelist,state | head
JOBID JOBID REASON SCHEDNODES NODELIST STATE
8536946 8536946_[401-899] Resources g002 PENDING
8658719 8536946_400 None (null) g006 RUNNING
8658685 8536946_399 None (null) g012 RUNNING
8658625 8536946_398 None (null) g001 RUNNING
8658491 8536946_397 None (null) g006 RUNNING
8658428 8536946_396 None (null) g003 RUNNING
8658427 8536946_395 None (null) g003 RUNNING
8658426 8536946_394 None (null) g007 RUNNING
8658425 8536946_393 None (null) g002 RUNNING
This strikes me as a bit odd.
Cheers,
Loris
Thekla Loizou <t.loizou at cyi.ac.cy> writes:
> Dear Loris,
>
> Thank you for your reply. I don't believe that there is something wrong with the
> job configuration or the node configuration to be honest.
>
> I have just submitted a simple sleep script:
>
> #!/bin/bash
>
> sleep 10
>
> as below:
>
> sbatch --array=1-10 --ntasks-per-node=40 --time=09:00:00 test.sh
>
> and squeue shows:
>
> 131799_1 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_2 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_3 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_4 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_5 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_6 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_7 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_8 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_9 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
> 131799_10 cpu test.sh thekla PD N/A 1
> cn04 (Priority)
>
> All of the jobs seem to be scheduled on node cn04.
>
> When they start running they run on separate nodes:
>
> 131799_1 cpu test.sh thekla R 0:02 1 cn01
> 131799_2 cpu test.sh thekla R 0:02 1 cn02
> 131799_3 cpu test.sh thekla R 0:02 1 cn03
> 131799_4 cpu test.sh thekla R 0:02 1 cn04
>
> Regards,
>
> Thekla
>
> On 7/12/21 5:17 μ.μ., Loris Bennett wrote:
>> Dear Thekla,
>>
>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>
>>> Dear Loris,
>>>
>>> There is no specific node required for this array. I can verify that from
>>> "scontrol show job 124841" since the requested node list is empty:
>>> ReqNodeList=(null)
>>>
>>> Also, all 17 nodes of the cluster are identical so all nodes fulfill the job
>>> requirements, not only node cn06.
>>>
>>> By "saving" the other nodes I mean that the scheduler estimates that the array
>>> jobs will start on 2021-12-11T03:58:00. No other jobs are scheduled to run
>>> during that time on the other nodes. So it seems that somehow the scheduler
>>> schedules the array jobs on more than one nodes but this is not showing in the
>>> squeue or scontrol output.
>> My guess is that there is something wrong with either the job
>> configuration or the node configuration, if Slurm thinks 9 jobs which
>> require a whole node can all be started simultaneously on same node.
>>
>> Cheers,
>>
>> Loris
>>
>>> Regards,
>>>
>>> Thekla
>>>
>>>
>>> On 7/12/21 12:16 μ.μ., Loris Bennett wrote:
>>>> Hi Thekla,
>>>>
>>>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have noticed that SLURM schedules several jobs from a job array on the same
>>>>> node with the same start time and end time.
>>>>>
>>>>> Each of these jobs requires the full node. You can see the squeue output below:
>>>>>
>>>>> JOBID PARTITION ST START_TIME NODES SCHEDNODES
>>>>> NODELIST(REASON)
>>>>>
>>>>> 124841_1 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_2 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_3 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_4 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_5 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_6 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_7 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_8 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>> 124841_9 cpu PD 2021-12-11T03:58:00 1
>>>>> cn06 (Priority)
>>>>>
>>>>> Is this a bug or am I missing something? Is this because the jobs have the same
>>>>> JOBID and are still in pending state? I am aware that the jobs will not actually
>>>>> all run on the same node at the same time and that the scheduler somehow takes
>>>>> into account that this job array has 9 jobs that will need 9 nodes. I am
>>>>> creating a timeline with the start time of all jobs and when the job array jobs
>>>>> will start running no other jobs are set to run on the remaining nodes (so it
>>>>> "saves" the other nodes for the jobs of the array even if they are all scheduled
>>>>> to run on the same node based on squeue or scontrol).
>>>> In general jobs from an array will be scheduled on whatever nodes
>>>> fulfil their requirements. The fact that all the jobs have
>>>>
>>>> cn06
>>>>
>>>> as NODELIST however seems to suggest that you have either specified cn06
>>>> as the node the jobs should run on, or cn06 is the only node which
>>>> fulfils the job requirements.
>>>>
>>>> I'm not sure what you mean about '"saving" the other nodes'.
>>>>
>>>> Cheers,
>>>>
>>>> Loris
>>>>
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list