Draining a node will not stop someone logging on via pam_slurm_adopt.
If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.
If they don’t have a running job, they can’t log on whether it is draining or not.
If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here:
https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access
Cheers,
Luke
--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road
Please note I don’t work on Monday.
From: Robert Kudyba via slurm-users <slurm-users@lists.schedmd.com>
Sent: Friday, April 19, 2024 9:17 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?
CAUTION: This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe. |
We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does not appear to come by default with
the Bright 'cm' package of Slurm.
Currently ssh to a node gets:
Login not allowed: no running jobs and no WLM allocations
We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.
Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?
Thanks!
Rob