question on sbatch --prefer

List overview All Threads
Download

newer

older

Question on using a suffix with...

Compilation question

Alan Stange

9 Feb 2024 9 Feb '24

3:56 p.m.

Hello all,

I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.

Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.

The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.

It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.

I'm hoping someone here has some experience with this and can point me in the right direction.

Sincerely,

Alan

Show replies by date

Chip Seraphine

9 Feb 9 Feb

3:59 p.m.

Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).

On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:

Hello all,

I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.

It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.

I'm hoping someone here has some experience with this and can point me in the right direction.

Sincerely,

Alan

-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com

This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.

Alan Stange

10 Feb 10 Feb

1:52 a.m.

New subject: [INTERNET] Re: question on sbatch --prefer

Chip,

Thank you for your prompt response. We could do that, but the helper is optional, and at times might involve additional helpers depending on the inputs to the problem being solved, and we don't a priori know the number of helpers that might be needed.

Alan

On 2/9/24 10:59, Chip Seraphine wrote:

...

Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).

On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:

Hello all,

I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.

Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.

The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.

It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.

I'm hoping someone here has some experience with this and can point me in the right direction.

Sincerely,

Alan

-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com

This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.

Brian Andrus

3:58 p.m.

New subject: [INTERNET] Re: question on sbatch --prefer

I imagine you could create a reservation for the node and then when you are completely done, remove the reservation.

Each helper could then target the reservation for the job.

Brian Andrus

On 2/9/2024 5:52 PM, Alan Stange via slurm-users wrote:

...

Chip,

Thank you for your prompt response. We could do that, but the helper is optional, and at times might involve additional helpers depending on the inputs to the problem being solved, and we don't a priori know the number of helpers that might be needed.

Alan

On 2/9/24 10:59, Chip Seraphine wrote:

...
Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly).

On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:

Hello all,

I'm somewhat new to Slurm, but long time user of other batch systems. Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks.

Lets say I have a long running job in the cluster, which needs to spawn a helper process into the cluster. We have a strong preference for this helper to run on the same cluster node as the original job, but if that node is already scheduled full, then we want this new task to be scheduled on another systems without any delay.

The problem I have is that the --nodelist doesn't solve this, and, as far as I can tell, there's no option with --prefer to specify a node name as a resource, without creating a gres for every hostname in the cluster.

It seems like what I'm trying to do should be achievable, but having read though the documentation and searched the archives of this list, I'm not seeing a solution.

I'm hoping someone here has some experience with this and can point me in the right direction.

Sincerely,

Alan

-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com

This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.

632

Age (days ago)

633

Last active (days ago)

slurm-users@lists.schedmd.com

3 comments

3 participants

tags (0)

participants (3)

Alan Stange
Brian Andrus
Chip Seraphine