<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>You might try looking at a partition QoS using the GrpTRESMins or
GrpTRESRunMins: <a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/resource_limits.html">https://slurm.schedmd.com/resource_limits.html</a></p>
<p>There are a bunch of options which may do what you want.</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 3/10/2021 9:13 AM, Marcel Breyer
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:f5880f5f-d58b-5fb0-b8e4-888b9210052d@ipvs.uni-stuttgart.de">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<p>Greetings,</p>
<p>we know about the SLURM configuration option <b>MaxSubmitJobsPerUser</b>
to limit the number of jobs a user can submit at a given time. <br>
</p>
<p>We would like to have a similar policy that says that the total
time for all jobs of a user cannot exceed a certain time limit.</p>
<p>For example (normal <b>MaxSubmitJobsPerUser = 2</b>):</p>
<p>srun --time 10 ...<br>
srun --time 20 ...<br>
srun --time 10 ... <- fails since only 2 jobs are allowed per
user</p>
<p><br>
</p>
<p>However, we want something like (for a maximum aggregate time
of e.g. 40mins): </p>
<p>srun --time 10 ...<br>
srun --time 20 ...<br>
srun --time 10 ...<br>
srun --time 5 ... <- fails since the total job times exceed
40mins</p>
<p><br>
</p>
<p>However, another allocation pattern could be:</p>
<p>srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ...<br>
srun --time 5 ... <- fails since the total job times exceed
40mins (however, after the first job completed, the new job can
be submitted normally)<br>
</p>
<p><br>
</p>
<p>In essence we would like to have a policy using the FIFO
scheduler (such that we don't have to specify another complex
scheduler) such that we can guarantee that another user has the
chance to get access to a machine after at most X time units
(40mins in the example above). <br>
</p>
<p>With the <b>MaxSubmitJobsPerUser </b>option we would have to
allow only a really small number of jobs (penalizing users that
divide their computation into small sub jobs) or X would be
rather large (num_jobs * max_wall_time).</p>
<p>Is there an option in SLURM that mimics such a behavior?<br>
</p>
<p>With best regards,<br>
Marcel Breyer<br>
</p>
</blockquote>
</body>
</html>