You need to find the node which the job started on. Then look at the slurmd log on that node. You may find an indication of the reason for the failure.
On Tue, 7 Jan 2025 at 11:30, sportlecon sportlecon via slurm-users < slurm-users@lists.schedmd.com> wrote:
slurm 24.11 - squeue displays reason "launch failed requeued held"
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com