[slurm-users] scheduling issue

Erik Eisold eisold at pks.mpg.de
Thu Aug 20 07:38:08 UTC 2020


Thank you for your reply and apologies for not reacting sooner I have 
kept busy until now. I have attached our partition definitions to this 
mail.

As for your second question MPI jobs aren't really a issue in our 
cluster there are a few in between but not nearly enough to explain up 
to 20 nodes at a time remaining idle for up to a day when we have the 
backfill scheduler configured and jobs in the short partition that would 
fit on the nodes that remain idle.

Kind regards,
Erik Eisold

On 14/08/2020 14:18, Renfro, Michael wrote:
> We’ve run a similar setup since I moved to Slurm 3 years ago, with no 
> issues. Could you share partition definitions from your slurm.conf?
>
> When you see a bunch of jobs pending, which ones have a reason of 
> “Resources”? Those should be the next ones to run, and ones with a 
> reason of “Priority” are waiting for higher priority jobs to start 
> (including the ones marked “Resources”). The only time I’ve seen nodes 
> sit idle is when there’s an MPI job pending with “Resources”, and if 
> any smaller jobs started, it would delay that job’s start.
>
> --
> Mike Renfro, PhD  / HPC Systems Administrator, Information Technology 
> Services
> 931 372-3601 <tel:931%20372-3601>      / Tennessee Tech University
>
>> On Aug 14, 2020, at 4:20 AM, Erik Eisold <eisold at pks.mpg.de> wrote:
>>
>> Our node topology is a bit special where almost all our nodes are in one
>> common partition a subset of all those nodes are then in another
>> partition and this repeats once more the only difference between the
>> partitions except the nodes in it are the maximum run time. The reason I
>> originally set it up this way was to ensure that users with shorter jobs
>> had a quicker response time and the whole cluster wouldn't be clogged up
>> with long running jobs for days on end this and I was new to the whole
>> cluster setup and Slurm itself. I have attached a rough visualization of
>> this setup to this mail. There are 2 more totally separate partitions
>> that are not in this image.
>>
>> My idea for a solution would be to move all nodes to one common
>> partition and using partition QOS to implement time and resource
>> restrictions because I think the scheduler is not really meant to handle
>> the type of setup we choose in the beginning.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200820/02c12b07/attachment.htm>
-------------- next part --------------
PartitionName=debug Nodes=eos[01-02],kalyke[01-72],oberon[01-16],triton[01-16] MaxTime=10:00 DefaultTime=10:00 State=UP PriorityJobFactor=1 QOS=default

PartitionName=extra_long Nodes=iris[01-12],osiris[01-05] MaxTime=28-00:00:00 DefaultTime=02:00:00 State=UP PriorityJobFactor=100 QOS=default
PartitionName=graphic Nodes=dione[01-06],icarus[01-05] MaxTime=2-00:00:00 DefaultTime=01:00:00 State=UP QOS=default PriorityJobFactor=100
PartitionName=long Nodes=amun[01-10],anubis[01-16],apollo[01-04],elara[01-60],flora[01-10],gaspra[01-52],hathor[01-08],hermes[01-08],horus[01-10],ida[01-08],io[01-32],iris[01-12],isis[01-12],kalyke[01-72],kepler[01-32],merkur[01-32],metis[01-72],mimas[01-08],oberon[01-16],pallas[01-08],rhea[01-32],seth[01-02],sinope[01-72],titan[01-08],thalia[01-60],triton[01-16],tycho[01-32] MaxTime=14-00:00:00 DefaultTime=2-00:00:00 State=UP QOS=default PriorityJobFactor=100
PartitionName=medium Nodes=amun[01-10],ananke[01-04],anubis[01-16],apollo[01-04],ceres[01-04],elara[01-60],flora[01-10],gaspra[29-52],hekat[01-05],hermes[01-08],horus[01-10],ida[01-08],io[01-32],iris[01-12],isis[01-12],juno[01-40],kepler[01-32],merkur[01-32],metis[01-72],oberon[01-16],osiris[01-05],rhea[01-32],seth[01-02],sinope[01-72],thalia[01-60],titan[01-08],triton[01-16],tycho[01-32],hathor[01-08],mimas[01-08],pallas[01-08] MaxTime=2-00:00:00 DefaultTime=2:00:00 State=UP QOS=default PriorityJobFactor=100
PartitionName=short Nodes=amun[01-10],ananke[01-04],anubis[01-16],apollo[01-04],ceres[01-04],elara[01-60],flora[01-10],gaspra[01-52],hermes[01-08],hekat[01-05],horus[01-10],ida[01-08],io[01-32],iris[01-12],juno[01-40],kepler[01-32],leda[01-72],merkur[01-32],metis[01-72],oberon[01-16],osiris[01-05],rhea[01-32],sinope[01-72],snowy[01-20],titan[01-08],triton[01-16],tycho[01-32],hathor[01-08],mimas[01-08],pallas[01-08],thalia[01-60] MaxTime=2:00:00 DefaultTime=1:00:00 State=UP Default=YES QOS=default PriorityJobFactor=100
PartitionName=testing Nodes=icarus[01-05],snowy[01-20],titan[01-08],openpower MaxTime=2-00:00:00 DefaultTime=01:00:00 State=UP QOS=default AllowGroups=edv


More information about the slurm-users mailing list