Draining a node will not stop someone logging on via pam_slurm_adopt.

 

If they have a running job, and can log on when the node is not draining, then they can log on when it is draining.

 

If they don’t have a running job, they can’t log on whether it is draining or not.

 

If you want people to be able to log on when they don’t have a job running, you could put them in a group which is given access in access.conf and PAM, as explained here: https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access

 

Cheers,

 

Luke

 

--

Luke Sudbery

Principal Engineer (HPC and Storage).

Architecture, Infrastructure and Systems

Advanced Research Computing, IT Services

Room 132, Computer Centre G5, Elms Road

 

Please note I don’t work on Monday.

 

From: Robert Kudyba via slurm-users <slurm-users@lists.schedmd.com>
Sent: Friday, April 19, 2024 9:17 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] any way to allow interactive jobs or ssh in Slurm 23.02 when node is draining?

 

CAUTION: This email originated from outside the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

 

We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does not appear to come by default with the Bright 'cm' package of Slurm.

 

Currently ssh to a node gets:

Login not allowed: no running jobs and no WLM allocations

 

We have 8 GPUs on a node so when we drain a node, which can have up to a 5 day job, no new jobs can run. And since we have 20+ TB (yes TB) local drives, researchers have their work and files on them to retrieve.

 

Is there a way to use /etc/security/access.conf to work around this at least temporarily until the reboot and then we can revert?

 

Thanks!

 

Rob