[slurm-users] Unprivileged cgroups v2
Maksim Melnik Storetvedt
maksim.melnik.storetvedt at cern.ch
Mon Aug 21 12:11:31 UTC 2023
We frequently encounter Slurm in use across the WLCG, which provides us with the slot where we (ALICE) run our job pilots. With the emergence of more multicore oriented workflows, these pilots have since become highly tasked with managing the resources we have within each slot, so to best utilise the resources given to us. With users often requesting arbitrary resources (cpu and memory in particular), combined with several user payloads running in parallel in the same slot (as seen by the BQ), this process has in turn become increasingly challenging.
One interesting development is the arrival of Cgroups v2, which provides means for unprivileged users to delegate controllers. This is a very useful feature in our use-case, as it would enable further subdividing the resources given to us within each slot, allowing the pilot to better "box-in" each subjob.
That said, in order to delegate controllers (e.g. for memory) to an unprivileged user, that user must be given ownership of the new cgroup given to them by Slurm, as well as the subtree_controller/procs files within that cgroup.
I see that in v22.05, users were already given ownership of the newly created cgroup provided to them (albeit sans the controller files), though this was later changed and removed in commit b0e4223<https://github.com/SchedMD/slurm/commit/b0e422399f43e81903ead651d8da4430ebb8ec89> - where the commit message suggests this behaviour should instead be avoided. With the additional permissions on the files that were not delegated at that point, this feature would actually be complete for us. Could you please reconsider supporting unprivileged cgroups v2? For the record, here is the diff<https://github.com/SchedMD/slurm/compare/slurm-22.05...zensanp:slurm:slurm-22.05> to v22.05 that allows us to further slice the allocated slot in smaller chunks.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users