[slurm-users] only 1 job running

Brian Andrus toomuchit at gmail.com
Thu Jan 28 19:07:41 UTC 2021


Heh. Your nodes are drained.

do:

scontrol update state=resume nodename=n[011-013]

If they go back into a drained state, you need to look into why. That 
will be in the slurmctld log. You can also see it with 'sinfo -R'

Brian Andrus

On 1/27/2021 10:18 PM, Chandler wrote:
> Made a little bit of progress by running sinfo:
>
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> defq*        up   infinite      3  drain n[011-013]
> defq*        up   infinite      1  alloc n010
>
> not sure why n[011-013] are in drain state, that needs to be fixed.
>
> After some searching, I ran:
>
> scontrol update nodename=n[011-013] state=idle
>
> and now 1 additional job has started on each of the n[011-013], so now 
> 4 jobs are running but the rest are still queued.  They should all be 
> running.  After some more searching, I guess resource sharing needs to 
> be turned on?  Can help with doing that?  I also attached the slurm.conf.
>
> Thanks
>



More information about the slurm-users mailing list