[slurm-users] Job array start time and SchedNodes
Thekla Loizou
t.loizou at cyi.ac.cy
Thu Dec 9 11:44:32 UTC 2021
Dear Loris,
Yes it is indeed a bit odd. At least now I know that this is how SLURM
behaves and not something that has to do with our configuration.
Regards,
Thekla
On 9/12/21 1:04 μ.μ., Loris Bennett wrote:
> Dear Thekla,
>
> Yes, I think you are right. I have found a similar job on my system and
> this does seem to be the normal, slightly confusing behaviour. It looks
> as if the pending elements of the array get assigned a single node,
> but then start on other nodes:
>
> $ squeue -j 8536946 -O jobid,jobarrayid,reason,schednodes,nodelist,state | head
> JOBID JOBID REASON SCHEDNODES NODELIST STATE
> 8536946 8536946_[401-899] Resources g002 PENDING
> 8658719 8536946_400 None (null) g006 RUNNING
> 8658685 8536946_399 None (null) g012 RUNNING
> 8658625 8536946_398 None (null) g001 RUNNING
> 8658491 8536946_397 None (null) g006 RUNNING
> 8658428 8536946_396 None (null) g003 RUNNING
> 8658427 8536946_395 None (null) g003 RUNNING
> 8658426 8536946_394 None (null) g007 RUNNING
> 8658425 8536946_393 None (null) g002 RUNNING
>
> This strikes me as a bit odd.
>
> Cheers,
>
> Loris
>
> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>
>> Dear Loris,
>>
>> Thank you for your reply. I don't believe that there is something wrong with the
>> job configuration or the node configuration to be honest.
>>
>> I have just submitted a simple sleep script:
>>
>> #!/bin/bash
>>
>> sleep 10
>>
>> as below:
>>
>> sbatch --array=1-10 --ntasks-per-node=40 --time=09:00:00 test.sh
>>
>> and squeue shows:
>>
>> 131799_1 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_2 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_3 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_4 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_5 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_6 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_7 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_8 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_9 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>> 131799_10 cpu test.sh thekla PD N/A 1
>> cn04 (Priority)
>>
>> All of the jobs seem to be scheduled on node cn04.
>>
>> When they start running they run on separate nodes:
>>
>> 131799_1 cpu test.sh thekla R 0:02 1 cn01
>> 131799_2 cpu test.sh thekla R 0:02 1 cn02
>> 131799_3 cpu test.sh thekla R 0:02 1 cn03
>> 131799_4 cpu test.sh thekla R 0:02 1 cn04
>>
>> Regards,
>>
>> Thekla
>>
>> On 7/12/21 5:17 μ.μ., Loris Bennett wrote:
>>> Dear Thekla,
>>>
>>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>>
>>>> Dear Loris,
>>>>
>>>> There is no specific node required for this array. I can verify that from
>>>> "scontrol show job 124841" since the requested node list is empty:
>>>> ReqNodeList=(null)
>>>>
>>>> Also, all 17 nodes of the cluster are identical so all nodes fulfill the job
>>>> requirements, not only node cn06.
>>>>
>>>> By "saving" the other nodes I mean that the scheduler estimates that the array
>>>> jobs will start on 2021-12-11T03:58:00. No other jobs are scheduled to run
>>>> during that time on the other nodes. So it seems that somehow the scheduler
>>>> schedules the array jobs on more than one nodes but this is not showing in the
>>>> squeue or scontrol output.
>>> My guess is that there is something wrong with either the job
>>> configuration or the node configuration, if Slurm thinks 9 jobs which
>>> require a whole node can all be started simultaneously on same node.
>>>
>>> Cheers,
>>>
>>> Loris
>>>
>>>> Regards,
>>>>
>>>> Thekla
>>>>
>>>>
>>>> On 7/12/21 12:16 μ.μ., Loris Bennett wrote:
>>>>> Hi Thekla,
>>>>>
>>>>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have noticed that SLURM schedules several jobs from a job array on the same
>>>>>> node with the same start time and end time.
>>>>>>
>>>>>> Each of these jobs requires the full node. You can see the squeue output below:
>>>>>>
>>>>>> JOBID PARTITION ST START_TIME NODES SCHEDNODES
>>>>>> NODELIST(REASON)
>>>>>>
>>>>>> 124841_1 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_2 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_3 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_4 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_5 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_6 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_7 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_8 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>> 124841_9 cpu PD 2021-12-11T03:58:00 1
>>>>>> cn06 (Priority)
>>>>>>
>>>>>> Is this a bug or am I missing something? Is this because the jobs have the same
>>>>>> JOBID and are still in pending state? I am aware that the jobs will not actually
>>>>>> all run on the same node at the same time and that the scheduler somehow takes
>>>>>> into account that this job array has 9 jobs that will need 9 nodes. I am
>>>>>> creating a timeline with the start time of all jobs and when the job array jobs
>>>>>> will start running no other jobs are set to run on the remaining nodes (so it
>>>>>> "saves" the other nodes for the jobs of the array even if they are all scheduled
>>>>>> to run on the same node based on squeue or scontrol).
>>>>> In general jobs from an array will be scheduled on whatever nodes
>>>>> fulfil their requirements. The fact that all the jobs have
>>>>>
>>>>> cn06
>>>>>
>>>>> as NODELIST however seems to suggest that you have either specified cn06
>>>>> as the node the jobs should run on, or cn06 is the only node which
>>>>> fulfils the job requirements.
>>>>>
>>>>> I'm not sure what you mean about '"saving" the other nodes'.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Loris
>>>>>
More information about the slurm-users
mailing list