Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.
The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.
It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.
I'm hoping someone here has some experience with this and can point me in the right direction.
Sincerely,
Alan