[slurm-users] detectCores() mess

Fri Dec 8 17:46:11 MST 2017

On 9/12/17 4:54 am, Mike Cammilleri wrote:

> I thought cgroups (which we are using) would prevent some of this
> behavior on the nodes (we are constraining CPU and RAM) -I'd like
> there to be no I/O wait times if possible. I would like it if either
> linux or slurm could constrain a job from grabbing more cores than
> assigned at submit time. Is there something else I should be
> configuring to safeguard against this behavior? If SLURM assigns 1
> cpu to the task then no matter what craziness is in the code, 1 is
> all they're getting. Possible?

That is exactly what cgroups does, a process within a cgroup that
only has a single core available to it will only be able to use
that one core.  If it fires up (for example) 8 threads or processes
then they will all run, but they will all be contending for that
single core.

You can check the cgroup for a process with:

cat /proc/$PID/cgroup

 From that you should be able to find the cgroup in the cpuset
controller and see how many cores are available to it.

You mention I/O wait times, that's going to be separate to the
number of cores available to a code, could you elaborate a little
on what you are seeing there?

There is some support for this in current kernels, but I don't
know when that landed and whether that will be in the kernel
available to you.  Also I don't remember seeing mention for
support for that in Slurm.

https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt

Best of luck,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC