[slurm-users] Job flexibility with cons_tres

Mon Feb 8 11:36:06 UTC 2021

Hello List,

we're running a heterogeneous cluster (just x86_64, but a lot of
different node types from 8 to 64 HW threads, 1 to 4 GPUs).
Our processing power (for our main application, at least) is 
exclusively provided by the GPUs, so cons_tres looks quite promising:
depending on the size of the job, request an appropriate number of
GPUs. Of course, you have to request some CPUs as well -- ideally,
evenly distributed among the GPUs (e.g. 10 per GPU on a 20-core, 2-GPU
node; 16 on a 64-core, 4-GPU node).
Of course, one could use different partitions for different nodes, and
then submit individual jobs with CPU requests tailored to one such
partition, but I'd prefer a more flexible approach where a given job
could run on any large enough node.

Is there anyone with a similar setup? Any config options I've missed,
or do you have a work-around?

Thanks,

A.

-- 
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
http://www.mpibpc.mpg.de/grubmueller/esztermann
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3643 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210208/91386970/attachment.bin>