[slurm-users] [EXT]Re: only 1 job running

Brian Andrus toomuchit at gmail.com
Thu Jan 28 22:50:30 UTC 2021


Yep, Looks like you are on the right track.

If the CPU count does not make sense to slurm, it will drain the node 
and jobs will not be able to start on them.

There does seem more to it though. Detailed info about a job and node 
would help.

The 'priority' pending jobs, you can ignore. Those aren't starting 
because another job is supposed to go first. That is the one with 
'Resources' as the reason.

Resources means the scheduler has allocated the resources on the node 
such that there aren't any left to be used.
My bet here is that you aren't specifying memory. If you don't specify 
it, slurm assumes all memory on the node for the job. So, even if you 
are only using 1 cpu, all the memory is allocated, leaving none for any 
other job to run on the unallocated cpus.

Brian Andrus

On 1/28/2021 2:15 PM, Chandler wrote:
>
> Brian Andrus wrote on 1/28/21 13:59:
>> What are the specific requests for resources from a job?
>> Nodes, Cores, Memory, threads, etc?
>
> Well the jobs are only asking for 16 CPUs each.  The 255 threads is 
> weird though, seems to be related to this,
> https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads 
>
>
> The vendor recommended to turn on IOMMU in the BIOS so I will try that 
> and see if it helps....
>



More information about the slurm-users mailing list