<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div>Hi Loris,</div><div><br></div><div>This is our submit filter for what you're asking. It checks for both --gres and --gpus</div><div><br></div><div style="margin-left:40px"> ESLURM_INVALID_GRES=2072<br> ESLURM_BAD_TASK_COUNT=2025<br> if ( job_desc.partition ~= slurm.NO_VAL ) then<br> if (job_desc.partition ~= nil) then<br> if (string.match(job_desc.partition,"gpgpu") or string.match(job_desc.partition,"gpgputest")) then<br> --slurm.log_info("slurm_job_submit (lua): detect job for gpgpu partition")<br> --Alert on invalid gpu count - eg: gpu:0 , gpu:p100:0<br> if (job_desc.gres and string.find(job_desc.gres, "gpu")) then<br> local numgpu = string.match(job_desc.gres, ":%d+$")<br> if(numgpu ~= nil) then<br> numgpu = numgpu:gsub(':', '')<br> if ( tonumber(numgpu) < 1) then<br> slurm.log_user("Invalid GPGPU count specified in GRES, must be greater than 0")<br> return ESLURM_INVALID_GRES<br> end<br> end<br> else<br> --Alternative use gpus in new version of slurm<br> if (job_desc.tres_per_node == nil) then<br> if (job_desc.tres_per_socket == nil) then<br> if (job_desc.tres_per_task == nil) then<br> slurm.log_user("You tried submitting to a GPGPU partition, but you didn't request one with GRES or GPUS")<br> return ESLURM_INVALID_GRES<br> else<br> if (job_desc.num_tasks == slurm.NO_VAL) then<br> slurm.user_msg("--gpus-per-task option requires --tasks specification")<br> return ESLURM_BAD_TASK_COUNT<br> end<br> end<br> end<br> end<br> end<br> end<br> end</div><div style="margin-left:40px"><br></div><div>Let me know if you improve it please? We're always on the hunt to fix up some of the logic in the submit filter.</div><div><br></div><div>Cheers,</div><div>Sean</div><div><br></div><div><div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">--<br>Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead<br>Research Computing Services | Business Services<br>The University of Melbourne, Victoria 3010 Australia<br><br></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 4 Dec 2020 at 23:58, Loris Bennett <<a href="mailto:loris.bennett@fu-berlin.de">loris.bennett@fu-berlin.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">UoM notice: External email. Be cautious of links, attachments, or impersonation attempts<br>
<br>
Hi,<br>
<br>
I want to reject jobs that don't specify any GPUs when accessing our GPU<br>
partition and have the following in job_submit.lua:<br>
<br>
if (job_desc.partition == "gpu" and job_desc.gres == nil ) then<br>
slurm.log_user(string.format("Please request GPU resources in the partition 'gpu', " ..<br>
"e.g. '#SBATCH --gres=gpu:1' " ..<br>
"Please see 'man sbatch' for more details)"))<br>
slurm.log_info(string.format("check_parameters: user '%s' did not request GPUs in partition 'gpu'",<br>
username))<br>
return slurm.ERROR<br>
end<br>
<br>
If GRES is not given for the GPU partition, this produces<br>
<br>
sbatch: error: Please request GPU resources in the partition 'gpu', e.g. '#SBATCH --gres=gpu:1' Please see 'man sbatch' for more details)<br>
sbatch: error: Batch job submission failed: Unspecified error<br>
<br>
My questions are:<br>
<br>
1. Is there a better error to return? The 'slurm.ERROR' produces the<br>
generic second error line above (slurm_errno.h just seems to have<br>
ESLURM_MISSING_TIME_LIMIT and ESLURM_INVALID_KNL as errors a plugin<br>
might raise). This is misleading, since the error is in fact known<br>
and specific.<br>
2. I am right in thinking that 'job_desc' does not, as of 20.02.06, have<br>
a 'gpus' field corresponding to the sbatch/srun option '--gpus'?<br>
<br>
Cheers,<br>
<br>
Loris<br>
<br>
-- <br>
Dr. Loris Bennett (Hr./Mr.)<br>
ZEDAT, Freie Universität Berlin Email <a href="mailto:loris.bennett@fu-berlin.de" target="_blank">loris.bennett@fu-berlin.de</a><br>
<br>
</blockquote></div>