Run 'sinfo -R' to see the reason any nodes may be down.

It may be as simple as running 'scontrol update state=resume nodename=xxxx' to bring them back, if they are down. It depends on the reason they went down (if that is the issue).

Otherwise, check the job requirements to see what it is asking for that does not exist 'scontrol show job xxx'

Brian Andrus

On 1/4/2025 3:41 AM, John Hearns via slurm-users wrote:
Output of sinfo and squeue

Look at slurmd log in an example node also
Tail -f is your friend 

On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users <slurm-users@lists.schedmd.com> wrote:
JOBID PARTITION     NAME       USER      ST       TIME  NODES  NODELIST(REASON)
                26       cpu myscript    user1  PD       0:00      4         (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
Anyone can help to  fix this?

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com