Hi Dietmar,
Dietmar Rieder via slurm-users slurm-users@lists.schedmd.com writes:
Hi Loris,
On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote:
Hi Dietmar, Dietmar Rieder via slurm-users slurm-users@lists.schedmd.com writes:
Hi,
is it possible to have slurm scheduling jobs automatical according to the "-t" time requirements to a fitting partition?
e.g. 3 partitions
PartitionName=standard Nodes=c-[01-10] Default=YES MaxTime=04:00:00 DefaultTime=00:10:00 State=UP OverSubscribe=NO PartitionName=medium Nodes=c-[04-08] Default=NO MaxTime=24:00:00 DefaultTime=04:00:00 State=UP OverSubscribe=NO PartitionName=long Nodes=c-[09-10] Default=NO MaxTime=336:00:00 DefaultTime=24:00:00 State=UP OverSubscribe=NO
So in the standard partition which is the default we have all nodes and a max time of 4h, in the medium partition we have 4 nodes with a max time of 24h and in the long partition we have 2 nodes with a max time of 336h.
I was hoping that if I submit a job with -t 01:00:00 it can be run on any node (standard partition), whereas when specifying -t 05:00:00 or -t 48:00:00 the job will run on the nodes of the medium or long partition respectively.
However, my job will not get scheduled at all when -t is greater than 01:00:00
i.e.
]$ srun --cpus-per-task 1 -t 01:00:01 --pty bash srun: Requested partition configuration not available now srun: job 42095 queued and waiting for resources
it will wait forever because the standard partition is selected, I was thinking that slurm would automatically switch to the medium partition.
Do I misunderstand something there? Or can this be somehow configured.
You can specify multiple partitions, e.g. $ salloc --cpus-per-task=1 --time=01:00:01 --partition=standard,medium,long Notice that rather than using 'srun ... --pty bash', as far as I understand, the preferred method is to use 'salloc' as above, and to use 'srun' for starting MPI processes.
Thanks for the hint. This works nicely, but it would be nice that I would not need to specify the partition at all. Any thoughts?
I am not aware that you can set multiple partition as a default.
The question is why you actually need partitions with different maximum runtimes.
In our case, a university cluster with a very wide range of codes and usage patterns, multiple partitions would probably lead to fragmentation and wastage of resources due to the job mix not always fitting well to the various partitions. Therefore, I am a member of the "as few partitions as possible" camp and so in our set-up we have as essentially only one partition with a DefaultTime of 14 days. We do however let users set a QOS to gain a priority boost in return for accepting a shorter run-time and a reduced maximum number of cores.
Occasionally people complain about short jobs having to wait in the queue for too long, but I have generally been successful in solving the problem by having them estimate their resource requirements better or bundling their work in ordert to increase the run-time-to-wait-time ratio.
Cheers,
Loris