[slurm-users] Job array start time and SchedNodes

Thu Dec 9 11:44:32 UTC 2021

Dear Loris,

Yes it is indeed a bit odd. At least now I know that this is how SLURM 
behaves and not something that has to do with our configuration.

Regards,

Thekla

On 9/12/21 1:04 μ.μ., Loris Bennett wrote:
> Dear Thekla,
>
> Yes, I think you are right.  I have found a similar job on my system and
> this does seem to be the normal, slightly confusing behaviour.  It looks
> as if the pending elements of the array get assigned a single node,
> but then start on other nodes:
>
>    $ squeue -j 8536946 -O jobid,jobarrayid,reason,schednodes,nodelist,state | head
>    JOBID               JOBID               REASON              SCHEDNODES          NODELIST            STATE
>    8536946             8536946_[401-899]   Resources           g002                                    PENDING
>    8658719             8536946_400         None                (null)              g006                RUNNING
>    8658685             8536946_399         None                (null)              g012                RUNNING
>    8658625             8536946_398         None                (null)              g001                RUNNING
>    8658491             8536946_397         None                (null)              g006                RUNNING
>    8658428             8536946_396         None                (null)              g003                RUNNING
>    8658427             8536946_395         None                (null)              g003                RUNNING
>    8658426             8536946_394         None                (null)              g007                RUNNING
>    8658425             8536946_393         None                (null)              g002                RUNNING
>
> This strikes me as a bit odd.
>
> Cheers,
>
> Loris
>
> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>
>> Dear Loris,
>>
>> Thank you for your reply. I don't believe that there is something wrong with the
>> job configuration or the node configuration to be honest.
>>
>> I have just submitted a simple sleep script:
>>
>> #!/bin/bash
>>
>> sleep 10
>>
>> as below:
>>
>> sbatch --array=1-10 --ntasks-per-node=40 --time=09:00:00 test.sh
>>
>> and squeue shows:
>>
>>            131799_1       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_2       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_3       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_4       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_5       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_6       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_7       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_8       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>            131799_9       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>           131799_10       cpu  test.sh   thekla PD N/A      1
>> cn04                 (Priority)
>>
>> All of the jobs seem to be scheduled on node cn04.
>>
>> When they start running they run on separate nodes:
>>
>>            131799_1       cpu  test.sh   thekla  R       0:02 1 cn01
>>            131799_2       cpu  test.sh   thekla  R       0:02 1 cn02
>>            131799_3       cpu  test.sh   thekla  R       0:02 1 cn03
>>            131799_4       cpu  test.sh   thekla  R       0:02 1 cn04
>>
>> Regards,
>>
>> Thekla
>>
>> On 7/12/21 5:17 μ.μ., Loris Bennett wrote:
>>> Dear Thekla,
>>>
>>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>>
>>>> Dear Loris,
>>>>
>>>> There is no specific node required for this array. I can verify that from
>>>> "scontrol show job 124841" since the requested node list is empty:
>>>> ReqNodeList=(null)
>>>>
>>>> Also, all 17 nodes of the cluster are identical so all nodes fulfill the job
>>>> requirements, not only node cn06.
>>>>
>>>> By "saving" the other nodes I mean that the scheduler estimates that the array
>>>> jobs will start on 2021-12-11T03:58:00. No other jobs are scheduled to run
>>>> during that time on the other nodes. So it seems that somehow the scheduler
>>>> schedules the array jobs on more than one nodes but this is not showing in the
>>>> squeue or scontrol output.
>>> My guess is that there is something wrong with either the job
>>> configuration or the node configuration, if Slurm thinks 9 jobs which
>>> require a whole node can all be started simultaneously on same node.
>>>
>>> Cheers,
>>>
>>> Loris
>>>
>>>> Regards,
>>>>
>>>> Thekla
>>>>
>>>>
>>>> On 7/12/21 12:16 μ.μ., Loris Bennett wrote:
>>>>> Hi Thekla,
>>>>>
>>>>> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have noticed that SLURM schedules several jobs from a job array on the same
>>>>>> node with the same start time and end time.
>>>>>>
>>>>>> Each of these jobs requires the full node. You can see the squeue output below:
>>>>>>
>>>>>>              JOBID     PARTITION  ST   START_TIME          NODES SCHEDNODES
>>>>>> NODELIST(REASON)
>>>>>>
>>>>>>              124841_1       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_2       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_3       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_4       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_5       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_6       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_7       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_8       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>              124841_9       cpu     PD 2021-12-11T03:58:00      1
>>>>>> cn06                 (Priority)
>>>>>>
>>>>>> Is this a bug or am I missing something? Is this because the jobs have the same
>>>>>> JOBID and are still in pending state? I am aware that the jobs will not actually
>>>>>> all run on the same node at the same time and that the scheduler somehow takes
>>>>>> into account that this job array has 9 jobs that will need 9 nodes. I am
>>>>>> creating a timeline with the start time of all jobs and when the job array jobs
>>>>>> will start running no other jobs are set to run on the remaining nodes (so it
>>>>>> "saves" the other nodes for the jobs of the array even if they are all scheduled
>>>>>> to run on the same node based on squeue or scontrol).
>>>>> In general jobs from an array will be scheduled on whatever nodes
>>>>> fulfil their requirements.  The fact that all the jobs have
>>>>>
>>>>>      cn06
>>>>>
>>>>> as NODELIST however seems to suggest that you have either specified cn06
>>>>> as the node the jobs should run on, or cn06 is the only node which
>>>>> fulfils the job requirements.
>>>>>
>>>>> I'm not sure what you mean about '"saving" the other nodes'.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Loris
>>>>>