[slurm-users] Job array start time and SchedNodes
Thekla Loizou
t.loizou at cyi.ac.cy
Tue Dec 7 08:02:27 UTC 2021
Dear all,
I have noticed that SLURM schedules several jobs from a job array on the
same node with the same start time and end time.
Each of these jobs requires the full node. You can see the squeue output
below:
JOBID PARTITION ST START_TIME NODES
SCHEDNODES NODELIST(REASON)
124841_1 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_2 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_3 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_4 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_5 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_6 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_7 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_8 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
124841_9 cpu PD 2021-12-11T03:58:00 1
cn06 (Priority)
Is this a bug or am I missing something? Is this because the jobs have
the same JOBID and are still in pending state? I am aware that the jobs
will not actually all run on the same node at the same time and that the
scheduler somehow takes into account that this job array has 9 jobs that
will need 9 nodes. I am creating a timeline with the start time of all
jobs and when the job array jobs will start running no other jobs are
set to run on the remaining nodes (so it "saves" the other nodes for the
jobs of the array even if they are all scheduled to run on the same node
based on squeue or scontrol).
Regards,
Thekla Loizou
HPC Systems Engineer
The Cyprus Institute
More information about the slurm-users
mailing list