[slurm-users] Overzealous PartitionQoS Limits
Christoph Brüning
christoph.bruening at uni-wuerzburg.de
Wed May 20 11:38:02 UTC 2020
Quick update:
When we increase the GrpNodes limit, some of the jobs start running.
However, they run on nodes that already have jobs from the "long"
partition running.
To my understanding, that should node change the node count against
which the GrpNodes limit is applied...
Best,
Christoph
On 20/05/2020 12.00, Christoph Brüning wrote:
> Dear all,
>
> we set up a floating partition as described in SLURM's QoS documentation
> to allow for jobs with a longer than usual walltime on a part of our
> cluster: QoS with GrpCPUs and GrpNodes limits attached to the
> longer-walltime partition which contains all nodes.
>
> We observe that jobs are stuck in the queue like:
>
> $ squeue -o "%.7i %.9P %.2t %.6C %.20S %R"
> JOBID PARTITION ST CPUS START_TIME NODELIST(REASON)
> 1108810 long PD 2 N/A (QOSGrpNodeLimit)
> 1108811 long PD 2 N/A (QOSGrpNodeLimit)
> 1108812 long PD 2 N/A (QOSGrpNodeLimit)
> 1108813 long PD 2 N/A (QOSGrpNodeLimit)
> 1108814 long PD 2 N/A (QOSGrpNodeLimit)
> 1108815 long PD 2 N/A (QOSGrpNodeLimit)
> 1108816 long PD 2 N/A (QOSGrpNodeLimit)
> 1108817 long PD 2 N/A (QOSGrpNodeLimit)
> 1108818 long PD 2 N/A (QOSGrpNodeLimit)
> [...]
>
> However, we are not even close to any of the GrpNodes or GrpCPUs limits.
> And there are nodes in MIXED state that should have slots for two-CPU
> jobs available.
> The mentioned jobs even have the highest priority (except for two jobs
> on a special-hardware partition), and they have an empty "Dependency="
> field.
>
> It seems that those jobs are occasionally assigned a start time when the
> scheduler runs, but that is quickly reverted to "N/A".
>
> Did any of you observe this or similar behaviour?
> FWIW, we are running SLURM 17.11 on Debian, an upgrade to 19.05 is
> scheduled in the next couple of weeks.
>
> Best,
> Christoph
>
>
--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499
More information about the slurm-users
mailing list