As admin on the cluster, we do not observe any issue on our newly added gpu nodes.
However, for regular users, they are not seeing their jobs running on these gpu nodes when running squeue -u <username> ( it is however showing as running status with sacct) and they are not able to ssh to these newly added nodes when they have a running job on it.
I am not sure if these 2 are related (not being to ssh to mgpu node with a running job on it and not listing a job with squeue for a user on the same node). There are no issues reported on the other nodes. Anyone know what is happening?
Best,
Fritz Ratnasamy
Data Scientist
Information Technology