canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

List overview All Threads
Download

newer

older

salloc+srun vs just srun

Enforcing relative resource...

Josef Dvoracek

26 Feb 2024 26 Feb '24

8:27 a.m.

What is the recommended way to run longer interactive job at your systems?

Our how-to includes starting screen at front-end node and running srun with bash/zsh inside, but that indeed brings dependency between login node (with screen) and the compute node job.

On systems with multiple front-ends users need to remember the login node where they have their screen session..

Are you anybody using something more advanced and still understandable by casual user of HPC?

(I know Open On Demand, but often the use of native console has certain benefits. )

cheers

josef

Attachments:

smime.p7s (application/pkcs7-signature — 4.2 KB)

Show replies by date

Ward Poelmans

26 Feb 26 Feb

8:47 a.m.

Hi,

On 26/02/2024 09:27, Josef Dvoracek via slurm-users wrote:

...

Are you anybody using something more advanced and still understandable by casual user of HPC?

I'm not sure it qualifies but:

sbatch --wrap 'screen -D -m' srun --jobid <PREV JOB_ID> --pty screen -rd

Or: sbatch -J screen --wrap 'screen -D -m' srun --jobid $(squeue -n screen -h -o '%A') --pty screen -rd

Ward

Josef Dvoracek

28 Feb 28 Feb

1:30 p.m.

From unclear reason "--wrap" was not part of my /repertoire/ so far.

thanks

On 26. 02. 24 9:47, Ward Poelmans via slurm-users wrote:

...

sbatch --wrap 'screen -D -m' srun --jobid <PREV JOB_ID> --pty screen -rd

Brian Andrus

27 Feb 27 Feb

4:21 p.m.

Josef,

for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time.

Also, for interactive X sessions, users start a desktop session on the node and then use vnc to connect there. This accommodates disconnection for any reason even for X-based apps.

Personally, I don't care much for interactive sessions in HPC, but there is a large body that only knows how to do things that way, so it is there.

Brian Andrus

On 2/26/2024 12:27 AM, Josef Dvoracek via slurm-users wrote:

...

What is the recommended way to run longer interactive job at your systems?

Our how-to includes starting screen at front-end node and running srun with bash/zsh inside, but that indeed brings dependency between login node (with screen) and the compute node job.

On systems with multiple front-ends users need to remember the login node where they have their screen session..

Are you anybody using something more advanced and still understandable by casual user of HPC?

(I know Open On Demand, but often the use of native console has certain benefits. )

cheers

josef

Hagdorn, Magnus Karl Moritz

28 Feb 28 Feb

8:10 a.m.

New subject: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:

...

for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time.

Hi Brian, that sounds interesting - how did you implement session affinity? cheers magnus

-- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagdorn@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpdesk@charite.de

Brian Andrus

8:36 p.m.

New subject: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

Magnus,

That is a feature of the load balancer. Most of them have that these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users wrote:

...

On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:

...
for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time.

Hi Brian, that sounds interesting - how did you implement session affinity? cheers magnus

Dan Healy

8:54 p.m.

New subject: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

Are most of us using HAProxy or something else?

On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote:

...

Magnus,

That is a feature of the load balancer. Most of them have that these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users wrote:

...
On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:

...
for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time.

Hi Brian, that sounds interesting - how did you implement session affinity? cheers magnus

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

-- Thanks, Daniel Healy

Cutts, Tim

9:29 p.m.

New subject: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

HAProxy, for on-prem things. In the cloud I just use their load balancers rather than implement my own.

Tim

-- Tim Cutts Scientific Computing Platform Lead AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Cataloguehttps://azcollaboration.sharepoint.com/sites/CMU993 |

From: Dan Healy via slurm-users slurm-users@lists.schedmd.com Date: Wednesday, 28 February 2024 at 20:56 To: Brian Andrus toomuchit@gmail.com Cc: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)? Are most of us using HAProxy or something else?

On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> wrote: Magnus,

That is a feature of the load balancer. Most of them have that these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users wrote:

...

On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:

...
for us, we put a load balancer in front of the login nodes with session affinity enabled. This makes them land on the same backend node each time.

Hi Brian, that sounds interesting - how did you implement session affinity? cheers magnus

-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com

-- Thanks,

Daniel Healy ________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com

Brian Andrus

10:04 p.m.

New subject: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

Most of my stuff is in the cloud, so I use their load balancing services.

HAProxy does have sticky sessions, which you can enable based on IP so it works with other protocols: 2 Ways to Enable Sticky Sessions in HAProxy (Guide) https://www.haproxy.com/blog/enable-sticky-sessions-in-haproxy

Brian Andrus

On 2/28/2024 12:54 PM, Dan Healy wrote:

...

Are most of us using HAProxy or something else?

On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users slurm-users@lists.schedmd.com wrote:

Magnus,

That is a feature of the load balancer. Most of them have that
these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users
wrote:
> On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users
wrote:
>> for us, we put a load balancer in front of the login nodes with
>> session
>> affinity enabled. This makes them land on the same backend node
each
>> time.
> Hi Brian,
> that sounds interesting - how did you implement session affinity?
> cheers
> magnus
>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

-- Thanks,

Daniel Healy

Chris Samuel

7:18 a.m.

On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote:

...

What is the recommended way to run longer interactive job at your systems?

We provide NX for our users and also access via JupyterHub.

We also have high priority QOS's intended for interactive use for rapid response, but they are capped at 4 hours (or 6 hours for Jupyter users).

All the best, Chris

528

Age (days ago)

530

Last active (days ago)

slurm-users@lists.schedmd.com

9 comments

7 participants

tags (0)

participants (7)

Brian Andrus
Chris Samuel
Cutts, Tim
Dan Healy
Hagdorn, Magnus Karl Moritz
Josef Dvoracek
Ward Poelmans