We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does not appear to come by default with the Bright 'cm' package of Slurm.
Currently ssh to a node gets: Login not allowed: no running jobs and no WLM allocations
We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.
Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?
Thanks!
Rob
Draining a node will not stop someone logging on via pam_slurm_adopt.
If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.
If they don’t have a running job, they can’t log on whether it is draining or not.
If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here: https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access
Cheers,
Luke
-- Luke Sudbery Principal Engineer (HPC and Storage). Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road
Please note I don’t work on Monday.
From: Robert Kudyba via slurm-users slurm-users@lists.schedmd.com Sent: Friday, April 19, 2024 9:17 PM To: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?
CAUTION: This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does not appear to come by default with the Bright 'cm' package of Slurm.
Currently ssh to a node gets: Login not allowed: no running jobs and no WLM allocations
We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.
Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?
Thanks!
Rob
Thanks for the reply, Luke. I also found that with Bright they have a file called /etc/security/pam_bright.d/pam_whitelist.conf that can be used to allow access.
On Thu, May 9, 2024 at 5:10 AM Luke Sudbery l.r.sudbery@bham.ac.uk wrote:
Draining a node will not stop someone logging on via pam_slurm_adopt.
If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.
If they don’t have a running job, they can’t log on whether it is draining or not.
If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here: https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_pam-5Fslurm-5Fadopt.html-23admin-5Faccess&d=DwMGaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=930NtoLMP-HvoNP-dfQ9jhRtE5LJnxRDm9D7MJkOJnZQJRNbHHXjsP41nIQyfBxL&s=4p4zui4pf8xYjAj48y_0dCLnMEudAClm-bNhCYct-ZM&e=
Cheers,
Luke
--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road
*Please note I don’t work on Monday.*
*From:* Robert Kudyba via slurm-users slurm-users@lists.schedmd.com *Sent:* Friday, April 19, 2024 9:17 PM *To:* Slurm User Community List slurm-users@lists.schedmd.com *Subject:* [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?
*CAUTION:* This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_pam-5Fslurm-5Fadopt.html&d=DwMGaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY&m=930NtoLMP-HvoNP-dfQ9jhRtE5LJnxRDm9D7MJkOJnZQJRNbHHXjsP41nIQyfBxL&s=Kch4xC6o-kw7TW21LcDVPMjH1a0Zl7TL1l8FiTdvLyI&e= which does not appear to come by default with the Bright 'cm' package of Slurm.
Currently ssh to a node gets:
Login not allowed: no running jobs and no WLM allocations
We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.
Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?
Thanks!
Rob