[slurm-users] Revisit: Split a GPU cluster into GPU cores and shared cores

Wed Apr 18 15:11:57 MDT 2018

Hello All,

I saw this post from 2014 and I was wondering if anyone had a good
solution. Post:

https://groups.google.com/forum/#!searchin/slurm-users/split$20cores$20partition%7Csort:date/slurm-users/R43s9MBPtZ8/fGkIvSVMdHUJ

My situation is similar. I have a GPU cluster with gres.conf entries which
look like:

NodeName=gpu-XX Name=gpu File=/dev/nvidia[0-1] CPUs=[0-5]
NodeName=gpu-XX Name=gpu File=/dev/nvidia[2-3] CPUs=[6-11]

However, as you can imagine 8 cores sit idle on these machines for no
reason. Is there a way to easily set this up? The post mentioned using QOS,
but for example if slurm fills up CPUs 0-7 and uses 8-11 for the GPU that
would be disastrous. I could use block distribution by default, but I don't
think I will ever be able to keep N cores on the socket idle if N GPUs are
idle. It might be worth noting that I am trying to avoid preemption of
these resources. However, it might be the only way (e.g. GPU jobs preempt
CPU only ones).

Thanks,

Barry

-- 
Barry E Moore II, PhD
E-mail: bmooreii at pitt.edu

Assistant Research Professor
Center for Research Computing
University of Pittsburgh
Pittsburgh, PA 15260
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180418/67efa02b/attachment.html>