[slurm-users] detectCores() mess

Mon Dec 11 13:42:51 MST 2017

Thanks for the responses. I think I didn't investigate deep enough - it appears that although I saw many processes running and a load average of something very high, the cgroups are indeed allocating the correct number of cores to the jobs, and threads are simply going to wait to run on the same cores/threads that were allocated.

I guess that when this happens, the load average in 'top' can show an extremely elevated number due to the fact that lots of processes are waiting to run - but in fact the overall availability of the node is still quite open as there are plenty of available cores left for other jobs. Would this be an accurate interpretation of the scheduling and load I'm observing? Are there impacts to the performance of the node when it is in this state?

Thanks everyone.

-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Chris Samuel
Sent: Friday, December 8, 2017 6:46 PM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] detectCores() mess

On 9/12/17 4:54 am, Mike Cammilleri wrote:

> I thought cgroups (which we are using) would prevent some of this 
> behavior on the nodes (we are constraining CPU and RAM) -I'd like 
> there to be no I/O wait times if possible. I would like it if either 
> linux or slurm could constrain a job from grabbing more cores than 
> assigned at submit time. Is there something else I should be 
> configuring to safeguard against this behavior? If SLURM assigns 1 cpu 
> to the task then no matter what craziness is in the code, 1 is all 
> they're getting. Possible?

That is exactly what cgroups does, a process within a cgroup that only has a single core available to it will only be able to use that one core.  If it fires up (for example) 8 threads or processes then they will all run, but they will all be contending for that single core.

You can check the cgroup for a process with:

cat /proc/$PID/cgroup

 From that you should be able to find the cgroup in the cpuset controller and see how many cores are available to it.

You mention I/O wait times, that's going to be separate to the number of cores available to a code, could you elaborate a little on what you are seeing there?

There is some support for this in current kernels, but I don't know when that landed and whether that will be in the kernel available to you.  Also I don't remember seeing mention for support for that in Slurm.

https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt

Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC