[slurm-users] Job allocating more CPUs than requested
Ryan Novosielski
novosirj at rutgers.edu
Fri Sep 21 22:35:34 MDT 2018
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 09/21/2018 11:22 PM, Chris Samuel wrote:
> On Saturday, 22 September 2018 2:53:58 AM AEST Nicolas Bock wrote:
>
>> shows as requesting 1 CPU when in queue, but then allocates all
>> CPU cores once running. Why is that?
>
> Do you mean that Slurm expands the cores requested to all the cores
> on the node or allocates the node in exclusive mode, or do you mean
> that the code inside the job uses all the cores on the node instead
> of what was requested?
>
> The latter is often the case for badly behaved codes and that's why
> using cgroups to contain applications is so important.
I apologize for potentially thread hijacking here, but it's in the
spirit of the original question I guess.
We constrain using cgroups, and occasionally someone will request 1
core (-n1 -c1) and then run something that asks for way more
cores/threads, or that tries to use the whole machine. They won't
succeed obviously. Is this any sort of problem? It seems to me that
trying to run 24 threads on a single core might generate some sort of
overhead, and that I/O could be increased, but I'm not sure. What I do
know is that if someone does this -- let's say in the extreme by
running something -n24 that itself tries to run 24 threads in each
task -- and someone uses the other 23 cores, you'll end up with a load
average near 24*24+23. Does this make any difference? We have NHC set
to offline such nodes, but that affects job preemption. What sort of
choices do others make in this area?
- --
____
|| \\UTGERS, |----------------------*O*------------------------
||_// the State | Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
`'
-----BEGIN PGP SIGNATURE-----
iEYEARECAAYFAlulxpAACgkQmb+gadEcsb543gCeOnUj+raTuEjLdYe+rfmHDiPP
kfgAn0zY0Ykm3fEOha9P25Q4m0F0/yKQ
=kI8g
-----END PGP SIGNATURE-----
More information about the slurm-users
mailing list