[slurm-users] Revisit: Split a GPU cluster into GPU cores and shared cores

Wed Apr 18 17:58:16 MDT 2018

On 19/04/18 07:11, Barry Moore wrote:

> My situation is similar. I have a GPU cluster with gres.conf entries 
> which look like:
> 
> NodeName=gpu-XX Name=gpu File=/dev/nvidia[0-1] CPUs=[0-5]
> NodeName=gpu-XX Name=gpu File=/dev/nvidia[2-3] CPUs=[6-11]
> 
> However, as you can imagine 8 cores sit idle on these machines for no 
> reason. Is there a way to easily set this up?

We do this with overlapping partitions:

PartitionName=skylake Default=YES State=DOWN [...] MaxCPUsPerNode=32
PartitionName=skylake-gpu Default=NO State=DOWN [...] Priority=1000

Our submit filter then forces jobs that request gres=gpu into the
skylake-gpu partition and those that don't into the skylake partition.

Our gres.conf has:

NodeName=[...] Name=gpu Type=p100 File=/dev/nvidia0 Cores=0-17
NodeName=[...] Name=gpu Type=p100 File=/dev/nvidia1 Cores=18-35

But of course the Cores= spec is just advisory to the scheduler,
the user can make that a hard requirement by specifying:

--gres-flags=enforce-binding

We do have the issue where the four free cores are on one socket,
rather than being equally distributed across the sockets. When I
solicited advice from SchedMD for our config it seems they are
doing some work in this area that may hopefully surface in the next
major release (though likely only as a "beta" proof of concept).

https://bugs.schedmd.com/show_bug.cgi?id=4717

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC