JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) Anyone can help to fix this?
Output of sinfo and squeue
Look at slurmd log in an example node also Tail -f is your friend
On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users < slurm-users@lists.schedmd.com> wrote:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) Anyone can help to fix this?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Run 'sinfo -R' to see the reason any nodes may be down.
It may be as simple as running 'scontrol update state=resume nodename=xxxx' to bring them back, if they are down. It depends on the reason they went down (if that is the issue).
Otherwise, check the job requirements to see what it is asking for that does not exist 'scontrol show job xxx'
Brian Andrus
On 1/4/2025 3:41 AM, John Hearns via slurm-users wrote:
Output of sinfo and squeue
Look at slurmd log in an example node also Tail -f is your friend
On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users slurm-users@lists.schedmd.com wrote:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) Anyone can help to fix this? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On Sat, 2025-01-04 at 08:11:21 -0000, Slurm users wrote:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) Anyone can help to fix this?
Not without a little bit of extra information, e.g. "sinfo -p cpu" and maybe "scontrol show job=26"
Best, Steffen