[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

11 Jun 2024


      Hi George,
George Leaver via slurm-users slurm-users@lists.schedmd.com writes:
...
Hi Loris,
...
Doesn't splitting up your jobs over two partitions mean that either
one of the two partitions could be full, while the other has idle
nodes?
Yes, potentially, and we may move away from our current config at some
point (it's a bit of a hangover from an SGE cluster.) Hasn't really
been an issue at the moment.
Do you find fragmentation a problem? Or do you just let the bf scheduler handle that (assuming jobs have a realistic wallclock request?)
Well, not with essentially only one partition we don't have
fragmentation related to that.  We did used to have multiple partitions
for different run-times, we did have fragmentation.  However, I couldn't
see any advantage in that setup, so we moved to one partition and
various QOS to handle say test or debug jobs.  However, users do still
sometimes add potentially arbitrary conditions to their jobs script,
such as the number of nodes for MPI jobs.  Whereas in principal it may
be a good idea to reduce the MPI-overhead by reducing the number of
nodes, in practice any such advantage may well be cancelled out or
exceeded by the extra time the job is going to have to wait for those
specific resources.
Backfill works fairly well for us, although indeed not without a little
badgering of users to get them to specify appropriate run-times.
...
But for now, would be handy if users didn't need to adjust their jobscripts (or we didn't need to write a submit script.)
If you ditch one of the partitions, you could always use a job submit
plug-in to replace the invalid partition specified by the job by the
remaining partition.
Cheers,
Loris
...
Regards,
George
--
George Leaver
Research Infrastructure, IT Services, University of Manchester
http://ri.itservices.manchester.ac.uk%C2%A0%7C%C2%A0@UoM_eResearch
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

2025

2024

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks