[slurm-users] ignore gpu resources to scheduled the cpu based jobs

Mon Jun 15 17:29:17 UTC 2020

So if a GPU job is submitted to a partition containing only GPU nodes, and a non-GPU job is submitted to a partition containing at least some nodes without GPUs, both jobs should be able to run. Priorities should be evaluated on a per-partition basis. I can 100% guarantee that in our HPC, pending GPU jobs don't block non-GPU jobs, and vice versa.

I could see a problem if the GPU job was submitted to a partition containing both types of nodes: if that job was assigned the highest priority for whatever reason (fair share, age, etc.), other jobs in the same partition would have to wait until that job started.

A simple solution would be to make a GPU partition containing only GPU nodes, and a non-GPU partition containing only non-GPU nodes. Submit GPU jobs to the GPU partition, and non-GPU jobs to the non-GPU partition.

Once that works, you could make a partition that includes both types of nodes to reduce idle resources, but jobs submitted to that partition would have to (a) not require a GPU, (b) require a limited number of CPUs per node, so that you'd have some CPUs available for GPU jobs on the nodes containing GPUs.

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of navin srivastava <navin.altair at gmail.com>
Sent: Saturday, June 13, 2020 10:47 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

Yes we have separate partitions. Some are specific to gpu having 2 nodes with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very few nodes are without any gpu.

Regards
Navin

On Sat, Jun 13, 2020, 21:11 navin srivastava <navin.altair at gmail.com<mailto:navin.altair at gmail.com>> wrote:
Thanks Renfro.

Yes we have both types of nodes with gpu and nongpu.
Also some users job require gpu and some applications use only CPU.

So the issue happens when user priority is high and waiting for gpu resources which is not available and the job with lower priority is waiting even though enough CPU is available which need only CPU resources.

When I hold gpu  jobs the cpu  jobs will go through.

Regards
Navin

On Sat, Jun 13, 2020, 20:37 Renfro, Michael <Renfro at tntech.edu<mailto:Renfro at tntech.edu>> wrote:
Will probably need more information to find a solution.

To start, do you have separate partitions for GPU and non-GPU jobs? Do you have nodes without GPUs?

On Jun 13, 2020, at 12:28 AM, navin srivastava <navin.altair at gmail.com<mailto:navin.altair at gmail.com>> wrote:

Hi All,

In our environment we have GPU. so what i found is if the user having high priority and his job is in queue and waiting for the GPU resources which are almost full and not available. so the other user submitted the job which does not require the GPU resources are in queue even though lots of cpu resources are available.

our scheduling mechanism is FIFO and Fair tree enabled. Is there any way we can make some changes so that the cpu based job should go through and GPU based job can wait till the GPU resources are free.

Regards
Navin.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200615/abf7a53c/attachment.htm>