[slurm-users] How to partition nodes into smaller units

Sat Feb 9 23:03:36 UTC 2019

Hello,

On 05.02.19 16:46, Ansgar Esztermann-Kirchner wrote:
> [...]-- we'd like to have two "half nodes", where
> jobs will be able to use one of the two GPUs, plus (at most) half of
> the CPUs. With SGE, we've put two queues on the nodes, but this
> effectively prevents certain maintenance jobs from running.
> How would I configure these nodes in Slurm?

why don't you use an additional "maintenance" queue/partition
containing the whole nodes?

Both SGE and SLURM support that.

> From the docs I gathered
> that MaxTRESPerJob would be a solution, but this is coupled to
> associations, which I do not fully understand. 
> Is this the best/only way to achieve such a partioning? 
> If so, do I need to define an association for every user, or can I
> define a default/skeleton association that new users automatically
> inherit?
> Are there other/better ways to go?

Let's agree on "other" ;)
use the OS to partition the resources on the host -- VM, systemd-nspawn,
... .

Because we have to run VMs and services parallel to SLURM I tested
partitioning our (small number of) hosts via Ganeti/KVM.
Side-effect: I was able to live migrate the (virtual) node running jobs
during maintenance.
Performance was very close to bare metal and while we are currently not
running our GPU jobs this way, even GPU pass-through should be possible
with negligible performance penalty:

Walters, et al. "GPU Passthrough Performance: A Comparison of KVM, Xen,
VMWare ESXi, and LXC for CUDA and OpenCL Applications"
https://ieeexplore.ieee.org/document/6973796?tp=&arnumber=6973796

Disadvantage of the OS-level partitioning might be additional effort
that's necessary.
But honestly, I even thought about stretching this even further for two
reason:
1. to gain a bit more flexibility to the (poor?) elastic features of
SLURM by defining purely virtual nodes of different size and start
whatever selection fits on a case by case basis
-- then again I wouldn't want to do that for 900 hosts without a proper
helper program.
2. separate (conservative) host OS and (modern, but stable) node OS to
ease up different constraints (we had back than)

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Redling
☎ +49 3641 9 44323