[slurm-users] wrong number of jobs used

Adrian Sevcenco Adrian.Sevcenco at spacescience.ro
Tue Jan 19 19:50:15 UTC 2021


Hi! So, i have a very strange situation that i do not even know how to 
troubleshoot...
I'm running with
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory,CR_LLN
TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=autobind=threads

and a partition defined with:
LLN=yes DefMemPerCPU=4000 MaxMemPerCPU=4040

PriorityType=priority/basic
SchedulerType=sched/builtin

This is a HEP cluster, so only serial single thread jobs.

(physically all nodes have 4 GB/thread)
the nodes are defined (now, only after a lot of experimentation and 
realization that if the node properties could and are incompatible with 
CR_CPU) just with CPUs and RealMemory defined (obtained from slurmd -C 
on each node)

and with FastSchedule=0

the problem is that the partition is stuck to a low number (around 834 
from 1424)

AVAIL NODES(A/I/O/T)  CPUS(A/I/O/T)    DEFAULTTIME    TIMELIMIT
up      23/0/0/23     837/587/0/1424   1-00:00:00   2-00:00:00


i set up SlurmctldDebug=debug and
DebugFlags=Priority,SelectType,NodeFeatures,CPU_Bind,NO_CONF_HASH

but i am not able to recognize anything as a problem.

Do anyone have any idea why not all my slots would be used?

Thank you!!
Adrian




More information about the slurm-users mailing list