[slurm-users] Meaning of assoc_limit_stop

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Mon Oct 22 10:59:04 MDT 2018


Hi,

My question is in regard to the scheduling parameter: assoc_limit_stop

"If set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utilization, but avoid  potentially  starving larger jobs by preventing them from launching indefinitely."

Does this mean that if some group is at their assoc limit, and this parameter is in place, then no other lower priority jobs from other groups in the partition will be candidates to be scheduled? This wouldn't make sense to me for a site to ever want to do this.

Or does the parameter really mean this:

"If set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs FROM THAT GROUP in that partition. Setting this can decrease system throughput and utilization, but avoid  potentially  starving larger jobs by preventing them from launching indefinitely."

If the meaning is the latter, than I don't see how it can decrease system throughput and utilization. I think if it means the former, we'd want to do this so the scheduler isn't worrying about potentially many thousand jobs from a group that is at their assoc limit and thus potentially increasing responsiveness.

Anyone have this parameter in production that can answer this?

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 



More information about the slurm-users mailing list