Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.
The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.
It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.
I'm hoping someone here has some experience with this and can point me in the right direction.
Sincerely,
Alan
Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).
On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.
The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.
It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.
I'm hoping someone here has some experience with this and can point me in the right direction.
Sincerely,
Alan
-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com
This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
Chip,
Thank you for your prompt response. We could do that, but the helper is optional, and at times might involve additional helpers depending on the inputs to the problem being solved, and we don't a priori know the number of helpers that might be needed.
Alan
On 2/9/24 10:59, Chip Seraphine wrote:
Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).
On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.
The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.
It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.
I'm hoping someone here has some experience with this and can point me in the right direction.
Sincerely,
Alan
-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com
This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
I imagine you could create a reservation for the node and then when you are completely done, remove the reservation.
Each helper could then target the reservation for the job.
Brian Andrus
On 2/9/2024 5:52 PM, Alan Stange via slurm-users wrote:
Chip,
Thank you for your prompt response. We could do that, but the helper is optional, and at times might involve additional helpers depending on the inputs to the problem being solved, and we don't a priori know the number of helpers that might be needed.
Alan
On 2/9/24 10:59, Chip Seraphine wrote:
Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).
On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.
The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.
It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.
I'm hoping someone here has some experience with this and can point me in the right direction.
Sincerely,
Alan
-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com
This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.