[slurm-users] detectCores() mess
Chris Samuel
chris at csamuel.org
Fri Dec 8 17:46:11 MST 2017
On 9/12/17 4:54 am, Mike Cammilleri wrote:
> I thought cgroups (which we are using) would prevent some of this
> behavior on the nodes (we are constraining CPU and RAM) -I'd like
> there to be no I/O wait times if possible. I would like it if either
> linux or slurm could constrain a job from grabbing more cores than
> assigned at submit time. Is there something else I should be
> configuring to safeguard against this behavior? If SLURM assigns 1
> cpu to the task then no matter what craziness is in the code, 1 is
> all they're getting. Possible?
That is exactly what cgroups does, a process within a cgroup that
only has a single core available to it will only be able to use
that one core. If it fires up (for example) 8 threads or processes
then they will all run, but they will all be contending for that
single core.
You can check the cgroup for a process with:
cat /proc/$PID/cgroup
From that you should be able to find the cgroup in the cpuset
controller and see how many cores are available to it.
You mention I/O wait times, that's going to be separate to the
number of cores available to a code, could you elaborate a little
on what you are seeing there?
There is some support for this in current kernels, but I don't
know when that landed and whether that will be in the kernel
available to you. Also I don't remember seeing mention for
support for that in Slurm.
https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt
Best of luck,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list