[slurm-users] disable-bindings disables counting of gres resources

Chris Samuel chris at csamuel.org
Sun Apr 14 01:12:40 UTC 2019


On Monday, 25 March 2019 2:30:34 AM PDT Peter Steinbach wrote:

> I observed a weird behavior of the '--gres-flags=disable-binding'
> option. With the above .conf files, I created a local slurm cluster with
> 3 computes (2 GPUs and 4 cores each).

First of all, you will want to use cgroups to ensure that processes that do
not request GPUs cannot access them.

Secondly, do your CPUs have hyperthreading enabled by some chance?
If so then your gres.conf is likely wrong as you'll want to list the first HT
on each core that you want to restrict access to.

>From the manual page for gres.conf:

              NOTE: If your cores contain multiple threads only list the  first  thread
              of  each  core.  The  logic  is  such that it uses core instead of thread
              scheduling per GRES. Also note that since Slurm must be able  to  perform
              resource management on heterogeneous clusters having various core ID num-
              bering schemes, an abstract index will be used instead  of  the  physical
              core  index.  That  abstract  id may not correspond to your physical core
              number.  Basically Slurm starts numbering from 0 to n, being 0 the id  of
              the  first processing unit (core or thread if HT is enabled) on the first
              socket, first core and maybe first thread, and  then  continuing  sequen-
              tially  to  the  next  thread,  core, and socket. The numbering generally
              coincides with the processing unit logical number (PU L#) seen in  lstopo
              output.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






More information about the slurm-users mailing list