[slurm-users] ignore gpu resources to scheduled the cpu based jobs

Tue Jun 16 05:23:29 UTC 2020

Thanks Renfro.

I will perform similar setting and let us see how it goes.

Regards

On Mon, Jun 15, 2020, 23:02 Renfro, Michael <Renfro at tntech.edu> wrote:

> So if a GPU job is submitted to a partition containing only GPU nodes, and
> a non-GPU job is submitted to a partition containing at least some nodes
> without GPUs, both jobs should be able to run. Priorities should be
> evaluated on a per-partition basis. I can 100% guarantee that in our HPC,
> pending GPU jobs don't block non-GPU jobs, and vice versa.
>
> I could see a problem if the GPU job was submitted to a partition
> containing both types of nodes: if that job was assigned the highest
> priority for whatever reason (fair share, age, etc.), other jobs in the
> same partition would have to wait until that job started.
>
> A simple solution would be to make a GPU partition containing only GPU
> nodes, and a non-GPU partition containing only non-GPU nodes. Submit GPU
> jobs to the GPU partition, and non-GPU jobs to the non-GPU partition.
>
> Once that works, you could make a partition that includes both types of
> nodes to reduce idle resources, but jobs submitted to that partition would
> have to (a) not require a GPU, (b) require a limited number of CPUs per
> node, so that you'd have some CPUs available for GPU jobs on the nodes
> containing GPUs.
>
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> navin srivastava <navin.altair at gmail.com>
> *Sent:* Saturday, June 13, 2020 10:47 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] ignore gpu resources to scheduled the cpu
> based jobs
>
>
> Yes we have separate partitions. Some are specific to gpu having 2 nodes
> with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very
> few nodes are without any gpu.
>
> Regards
> Navin
>
>
> On Sat, Jun 13, 2020, 21:11 navin srivastava <navin.altair at gmail.com>
> wrote:
>
> Thanks Renfro.
>
> Yes we have both types of nodes with gpu and nongpu.
> Also some users job require gpu and some applications use only CPU.
>
> So the issue happens when user priority is high and waiting for gpu
> resources which is not available and the job with lower priority is waiting
> even though enough CPU is available which need only CPU resources.
>
> When I hold gpu  jobs the cpu  jobs will go through.
>
> Regards
> Navin
>
> On Sat, Jun 13, 2020, 20:37 Renfro, Michael <Renfro at tntech.edu> wrote:
>
> Will probably need more information to find a solution.
>
> To start, do you have separate partitions for GPU and non-GPU jobs? Do you
> have nodes without GPUs?
>
> On Jun 13, 2020, at 12:28 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
>
> Hi All,
>
> In our environment we have GPU. so what i found is if the user having high
> priority and his job is in queue and waiting for the GPU resources which
> are almost full and not available. so the other user submitted the job
> which does not require the GPU resources are in queue even though lots of
> cpu resources are available.
>
> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
> we can make some changes so that the cpu based job should go through and
> GPU based job can wait till the GPU resources are free.
>
> Regards
> Navin.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200616/5c2f31be/attachment.htm>