[slurm-users] Job array start time and SchedNodes

Thekla Loizou t.loizou at cyi.ac.cy
Tue Dec 7 08:02:27 UTC 2021


Dear all,

I have noticed that SLURM schedules several jobs from a job array on the 
same node with the same start time and end time.

Each of these jobs requires the full node. You can see the squeue output 
below:

           JOBID     PARTITION  ST   START_TIME          NODES 
SCHEDNODES   NODELIST(REASON)

           124841_1       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_2       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_3       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_4       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_5       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_6       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_7       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_8       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)
           124841_9       cpu     PD 2021-12-11T03:58:00      1 
cn06                 (Priority)

Is this a bug or am I missing something? Is this because the jobs have 
the same JOBID and are still in pending state? I am aware that the jobs 
will not actually all run on the same node at the same time and that the 
scheduler somehow takes into account that this job array has 9 jobs that 
will need 9 nodes. I am creating a timeline with the start time of all 
jobs and when the job array jobs will start running no other jobs are 
set to run on the remaining nodes (so it "saves" the other nodes for the 
jobs of the array even if they are all scheduled to run on the same node 
based on squeue or scontrol).

Regards,
Thekla Loizou
HPC Systems Engineer
The Cyprus Institute



More information about the slurm-users mailing list