I am pretty sure with vanilla slurm is impossible.

What it might be possible (maybe) is submitting 5 core jobs and using some pre-post scripts which immediately before the job start change the requested number of cores to "however are currently available on the node where it is scheduled to run". That feels like a nightmare script to write, prone to race conditions (e.g. what is slurm has scheduled another job on the same node to start almost at the same time?). It also may be impractical (the modified job will probably need to be rescheduled, possibly landing on another node with a different number of idle cores) or impossible (maybe slurm does not offer the possibility of changing the requested nodes after the job has been assigned a node, only at other times, such as submission time). 

What is theoretically possible would be to use slurm only as a "dummy bean counter": submit the job as a 5 core job and let it land and start on a node. The job itself does nothing other than counting the number of idle nodes on that core and submitting *another* slurm job of the highest priority targeting that specific node (option -w) and that number of cores. If the second job starts, then by some other mechanism, probably external to slurm, the actual computational job will start on the appropriate nodes. If that happens outside of slurm, it would be very hard to get right (with the appropriate cgroup for example). If that happens inside of slurm, it needs some functionality which I am not aware exists, but it sounds more likely than "changing the number of cores at the moment the job start". For example the two jobs could merge into one. Or the two jobs could stay separate, but share some MPI communicator or thread space (but again have troubles with the separate cgroups they live in).

So in conclusion if this is just a few jobs where you are trying to be more efficient, I think it's better to give up. If this is something of really large scale and important, then my recommendation would be to purchase official Slurm support and get assistance from them

On Fri, Aug 2, 2024 at 8:37 AM Laura Hild via slurm-users <slurm-users@lists.schedmd.com> wrote:
My read is that Henrique wants to specify a job to require a variable number of CPUs on one node, so that when the job is at the front of the queue, it will run opportunistically on however many happen to be available on a single node as long as there are at least five.

I don't personally know of a way to specify such a job, and wouldn't be surprised if there isn't one, since as other posters have suggested, usually there's a core-count sweet spot that should be used, achieving a performance goal while making efficient use of resources.  A cluster administrator may in fact not want you using extra cores, even if there's a bit more speed-up to be had, when those cores could be used more efficiently by another job.  I'm also not sure how one would set a judicious TimeLimit on a job that would have such a variable wall-time.

So there is the question of whether it is possible, and whether it is advisable.

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com