[slurm-users] Job array start time and SchedNodes

Thekla Loizou t.loizou at cyi.ac.cy
Tue Dec 7 14:20:32 UTC 2021


Dear Loris,

There is no specific node required for this array. I can verify that 
from "scontrol show job 124841" since the requested node list is empty: 
ReqNodeList=(null)

Also, all 17 nodes of the cluster are identical so all nodes fulfill the 
job requirements, not only node cn06.

By "saving" the other nodes I mean that the scheduler estimates that the 
array jobs will start on 2021-12-11T03:58:00. No other jobs are 
scheduled to run during that time on the other nodes. So it seems that 
somehow the scheduler schedules the array jobs on more than one nodes 
but this is not showing in the squeue or scontrol output.

Regards,

Thekla


On 7/12/21 12:16 μ.μ., Loris Bennett wrote:
> Hi Thekla,
>
> Thekla Loizou <t.loizou at cyi.ac.cy> writes:
>
>> Dear all,
>>
>> I have noticed that SLURM schedules several jobs from a job array on the same
>> node with the same start time and end time.
>>
>> Each of these jobs requires the full node. You can see the squeue output below:
>>
>>            JOBID     PARTITION  ST   START_TIME          NODES SCHEDNODES
>> NODELIST(REASON)
>>
>>            124841_1       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_2       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_3       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_4       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_5       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_6       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_7       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_8       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>            124841_9       cpu     PD 2021-12-11T03:58:00      1
>> cn06                 (Priority)
>>
>> Is this a bug or am I missing something? Is this because the jobs have the same
>> JOBID and are still in pending state? I am aware that the jobs will not actually
>> all run on the same node at the same time and that the scheduler somehow takes
>> into account that this job array has 9 jobs that will need 9 nodes. I am
>> creating a timeline with the start time of all jobs and when the job array jobs
>> will start running no other jobs are set to run on the remaining nodes (so it
>> "saves" the other nodes for the jobs of the array even if they are all scheduled
>> to run on the same node based on squeue or scontrol).
> In general jobs from an array will be scheduled on whatever nodes
> fulfil their requirements.  The fact that all the jobs have
>
>    cn06
>
> as NODELIST however seems to suggest that you have either specified cn06
> as the node the jobs should run on, or cn06 is the only node which
> fulfils the job requirements.
>
> I'm not sure what you mean about '"saving" the other nodes'.
>
> Cheers,
>
> Loris
>



More information about the slurm-users mailing list