You can’t have both exclusive access to a node and sharing, that makes no sense. You see this on AWS as well – you can select either sharing a physical machine or not. There is
no “don’t share if possible, and share otherwise”.
Unless you configure SLURM to overcommit CPUs, by definition if you request all the CPUs in the machine, you will get exclusive access. But if any of the CPUs are allocated, then
your job won’t start.
One way you can improve this, is to configure SLURM to try to fill each node up with jobs first, before starting to schedule jobs to a new node. This isn’t good for traditional
HPC MPI jobs, but if your jobs are all multithreaded or single-threaded, this will work quite well, and will keep nodes free so that jobs which do actually require exclusive access are more likely to be scheduled. This probably means (but others please correct
me) that you DON’T want CR_LLN, and you probably do want CR_Pack_Nodes.
Tim
--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service
Catalogue |
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com