[slurm-users] Gres CPU binding in 17.02.10
崔灏 (CUI Hao)
cuihao.leo at gmail.com
Wed May 30 02:40:43 MDT 2018
We are facing the same issues stated as SLURM Bug 4717 [1]. Moe Jette
clarified that in the old gres.conf manpage the statements about
Cores= parameter that "only the identified cores can be allocated with
each generic resource" is wrong, and the core binding setting is only
advisory by default.
We have been using SLURM version 17.02.10. CPUs= parameter is used
instead of Cores= parameter. I suppose that they have exactly the same
effect. However, it seems that CPUs= does enforce CPU binding on our
cluster. If one user allocates all cores bind to a GPU, then the GPU
will become unallocatable. (So --gres-flags=enforce-binding is just
the default behaviour and I haven't find a way to disable it.)
I checked RELEASE_NOTES of 17.02 and 17.11, but didn't find any clue
about the behaviour change. Can anyone tell if this is the expected
behaviour in version 17.02.10, or if it is a bug hasn't been fixed in
17.02.10, or something went wrong with our configuration?
Here is our gres.conf:
NodeName=wmc-slave-g[1-3] Name=gpu File=/dev/nvidia[0-1] CPUs=0-11
NodeName=wmc-slave-g[1-3] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23
[1] https://bugs.schedmd.com/show_bug.cgi?id=4717
--
崔灏 / CUI Hao
Homepage: i-yu.me
Twitter: @cuihaoleo
More information about the slurm-users
mailing list