any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?

List overview All Threads
Download

newer

older

Which "oci.conf" to use?

Announcing Slurm-web v3.0.0, open...

Robert Kudyba

19 Apr 2024 19 Apr '24

9:17 p.m.

We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does not appear to come by default with the Bright 'cm' package of Slurm.

Currently ssh to a node gets: Login not allowed: no running jobs and no WLM allocations

We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.

Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?

Thanks!

Rob

Attachments:

attachment.html (text/html — 892 bytes)

Show replies by date

Luke Sudbery

9 May 9 May

10:10 a.m.

Draining a node will not stop someone logging on via pam_slurm_adopt.

If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.

If they don’t have a running job, they can’t log on whether it is draining or not.

If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here: https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access

Cheers,

Luke

-- Luke Sudbery Principal Engineer (HPC and Storage). Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road

Please note I don’t work on Monday.

From: Robert Kudyba via slurm-users slurm-users@lists.schedmd.com Sent: Friday, April 19, 2024 9:17 PM To: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?

CAUTION: This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

Currently ssh to a node gets: Login not allowed: no running jobs and no WLM allocations

Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?

Thanks!

Rob

Robert Kudyba

13 May 13 May

9:26 p.m.

Thanks for the reply, Luke. I also found that with Bright they have a file called /etc/security/pam_bright.d/pam_whitelist.conf that can be used to allow access.

On Thu, May 9, 2024 at 5:10 AM Luke Sudbery l.r.sudbery@bham.ac.uk wrote:

...

Draining a node will not stop someone logging on via pam_slurm_adopt.

If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.

If they don’t have a running job, they can’t log on whether it is draining or not.

If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here: https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_pam-5Fslurm-5Fadopt.html-23admin-5Faccess&d=DwMGaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=930NtoLMP-HvoNP-dfQ9jhRtE5LJnxRDm9D7MJkOJnZQJRNbHHXjsP41nIQyfBxL&s=4p4zui4pf8xYjAj48y_0dCLnMEudAClm-bNhCYct-ZM&e=

Cheers,

Luke

--

Luke Sudbery

Principal Engineer (HPC and Storage).

Architecture, Infrastructure and Systems

Advanced Research Computing, IT Services

Room 132, Computer Centre G5, Elms Road

*Please note I don’t work on Monday.*

*From:* Robert Kudyba via slurm-users slurm-users@lists.schedmd.com *Sent:* Friday, April 19, 2024 9:17 PM *To:* Slurm User Community List slurm-users@lists.schedmd.com *Subject:* [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?

*CAUTION:* This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_pam-5Fslurm-5Fadopt.html&d=DwMGaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=930NtoLMP-HvoNP-dfQ9jhRtE5LJnxRDm9D7MJkOJnZQJRNbHHXjsP41nIQyfBxL&s=Kch4xC6o-kw7TW21LcDVPMjH1a0Zl7TL1l8FiTdvLyI&e= which does not appear to come by default with the Bright 'cm' package of Slurm.

Currently ssh to a node gets:

Login not allowed: no running jobs and no WLM allocations

We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.

Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?

Thanks!

Rob

468

Age (days ago)

492

Last active (days ago)

slurm-users@lists.schedmd.com

2 comments

2 participants

tags (0)

participants (2)

Luke Sudbery
Robert Kudyba