[slurm-users] job_submit.lua - choice of error on failure / job_desc.gpus?

Loris Bennett loris.bennett at fu-berlin.de
Fri Dec 4 12:58:59 UTC 2020


Hi,

I want to reject jobs that don't specify any GPUs when accessing our GPU
partition and have the following in job_submit.lua:

  if (job_desc.partition == "gpu" and job_desc.gres == nil ) then
     slurm.log_user(string.format("Please request GPU resources in the partition 'gpu', " ..
                                     "e.g. '#SBATCH --gres=gpu:1' " ..
                                     "Please see 'man sbatch' for more details)"))
     slurm.log_info(string.format("check_parameters: user '%s' did not request GPUs in partition 'gpu'",
                                  username))
     return slurm.ERROR
  end

If GRES is not given for the GPU partition, this produces

  sbatch: error: Please request GPU resources in the partition 'gpu', e.g. '#SBATCH --gres=gpu:1' Please see 'man sbatch' for more details)
  sbatch: error: Batch job submission failed: Unspecified error

My questions are:

1. Is there a better error to return?  The 'slurm.ERROR' produces the
   generic second error line above (slurm_errno.h just seems to have
   ESLURM_MISSING_TIME_LIMIT and ESLURM_INVALID_KNL as errors a plugin
   might raise).  This is misleading, since the error is in fact known
   and specific.
2. I am right in thinking that 'job_desc' does not, as of 20.02.06, have
   a 'gpus' field corresponding to the sbatch/srun option '--gpus'?

Cheers,

Loris

-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list