Run 'sinfo -R' to see the reason any nodes may be down.
It may be as simple as running 'scontrol update state=resume nodename=xxxx' to bring them back, if they are down. It depends on the reason they went down (if that is the issue).
Otherwise, check the job requirements to see what it is asking for that does not exist 'scontrol show job xxx'
Brian Andrus
Output of sinfo and squeue
Look at slurmd log in an example node alsoTail -f is your friend
On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users <slurm-users@lists.schedmd.com> wrote:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
Anyone can help to fix this?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com