[slurm-users] only 1 job running
Andy Riebs
andy at candooz.com
Thu Jan 28 14:53:39 UTC 2021
Hi Chandler,
If the only changes to your system have been the slurm.conf
configuration and the addition of a new node, the easiest way to track
this down is probably to show us the diffs between the previous and
current versions of slurm.conf, and a note about what's different about
the new node that you want to address.
Andy
On 1/28/2021 1:18 AM, Chandler wrote:
> Made a little bit of progress by running sinfo:
>
> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
> defq* up infinite 3 drain n[011-013]
> defq* up infinite 1 alloc n010
>
> not sure why n[011-013] are in drain state, that needs to be fixed.
>
> After some searching, I ran:
>
> scontrol update nodename=n[011-013] state=idle
>
> and now 1 additional job has started on each of the n[011-013], so now
> 4 jobs are running but the rest are still queued. They should all be
> running. After some more searching, I guess resource sharing needs to
> be turned on? Can help with doing that? I also attached the slurm.conf.
>
> Thanks
>
More information about the slurm-users
mailing list