So the squeue issue was resolved and was due to the partition being hidden. Unhiding it solves the problem. However, the ssh issue remains (looks like both were separate issues). 
The pam_slurm_adopt is working on all the other nodes but not on the new ones. Any idea how to solve this?
Best, 

Fritz Ratnasamy

Data Scientist

Information Technology




On Thu, Jun 6, 2024 at 2:11 PM Ratnasamy, Fritz via slurm-users <slurm-users@lists.schedmd.com> wrote:
As admin on the cluster, we do not observe any issue on our newly added gpu nodes. 
However, for regular users, they are not seeing their jobs running on these gpu nodes  when running squeue -u <username> ( it is  however showing  as running status with sacct) and they are not able to ssh to these newly added nodes when they have a running job on it. 
I am not sure if these 2 are related (not being to ssh to mgpu node  with a running job on it and not listing a job with squeue for a user on the same node). There are no issues reported on the other nodes. Anyone know what is happening?
Best, 

Fritz Ratnasamy

Data Scientist

Information Technology


CAUTION: This email has originated outside of University email systems. Please do not click links or open attachments unless you recognize the sender and trust the contents as safe.