[slurm-users] Another question about partition and node allocation

Renata Maria Dart renata at slac.stanford.edu
Fri Apr 10 21:32:04 UTC 2020


Hi, we have 40 nodes (all the same, amd nodes with 128 cores) which
have all been purchased by different groups at our lab and each group
would like to have immediate access of course to what they have paid
for.  The stakeholder groups are also fine with allowing the general
public to use their hosts/cores provided they can preempt the general
public's jobs.  One way I can see to do that is to assign specific
nodes to each stakeholder group defined as a partition, something like this:

PartitionName=shared     Default=yes   Priority=10   MaxTime=5-00:00:00     DefaultTime=30            PreemptMode=CANCEL   State=UP   Nodes=amd[0001-0040]
PartitionName=exp1       Default=no    Priority=50   MaxTime=5-00:00:00     DefaultTime=1-00:00:00    PreemptMode=OFF      State=UP   Nodes=amd[0001-0003]
PartitionName=exp2       Default=no    Priority=50   MaxTime=5-00:00:00     DefaultTime=1-00:00:00    PreemptMode=OFF      State=UP   Nodes=amd[0004-0019]
PartitionName=exp3       Default=no    Priority=50   MaxTime=5-00:00:00     DefaultTime=1-00:00:00    PreemptMode=OFF      State=UP   Nodes=amd[0020-0040]


Is this the most efficient and best use of resources?  In the above
scenario if scavenger jobs are running on a given experiment's hosts
and the experiment needs to run jobs, then scavenger jobs get
preempted, even if there are idle hosts in the other stakeholder
partitions.  Is there a way to guarantee say exp1 that they will have
priority on 386 cores but not necessarily tie them to 3 specific hosts?

 Thanks,
 Renata





More information about the slurm-users mailing list