[slurm-users] node showing "Low socket*core count"
Noam Bernstein
noam.bernstein at nrl.navy.mil
Wed Oct 10 14:55:51 MDT 2018
> On Oct 10, 2018, at 12:07 PM, Noam Bernstein <noam.bernstein at nrl.navy.mil> wrote:
>
>
> slurmd -C confirms that indeed slurm understands the architecture, so that’s good. However, removing the CPUs entry from the node list doesn’t change anything. It still drains the node. If I just remove _everything_ having to do with those counts from the node list item it just picks 1 cpu.
>
Interestingly, it looks like maybe the only problem was that I had to manually set the state to Resume. I was playing around with different settings for various properties, and definitely had things screwed up in a way that would trigger a drain state at some point. Apparently after I fixed the situation slurm was actually OK with it, but just didn't reset the state properly. Seems to be OK now, once I got all the items set consistently.
More information about the slurm-users
mailing list