[slurm-users] Job Step Resource Requests are Ignored

Wed May 6 18:37:50 UTC 2020

That's great! Thanks David!

On Wed, May 6, 2020 at 11:35 AM David Braun <dlbraun at umich.edu> wrote:

> i'm not sure I understand the problem.  If you want to make sure the
> preamble and postamble run even if the main job doesn't run you can use '-d'
>
> from the man page
>
> -d, --dependency=<dependency_list>
>               Defer   the   start   of   this   job   until   the
> specified   dependencies   have   been   satisfied   completed.
>  <dependency_list>   is   of   the  form
>               <type:job_id[:job_id][,type:job_id[:job_id]]> or
> <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies must be
> satisfied  if  the  ","  separator  is
>               used.   Any  dependency  may  be  satisfied  if  the "?"
> separator is used.  Many jobs can share the same dependency and these jobs
> may even belong to different
>               users. The  value may be changed after job submission using
> the scontrol command.  Once a job dependency fails due to the termination
> state of a preceding  job,
>               the dependent job will never be run, even if the preceding
> job is requeued and has a different termination state in a subsequent
> execution.
>
>
> for instance, create a job that contains this:
>
> preamble_id=`sbatch preamble.job`
> main_id=`sbatch -d afterok:$preamble_id main.job`
> sbatch -d afterany:$main_id postamble.job
>
> Best,
>
> D
>
> On Wed, May 6, 2020 at 2:19 PM Maria Semple <maria at rstudio.com> wrote:
>
>> Hi Chris,
>>
>> I think my question isn't quite clear, but I'm also pretty confident the
>> answer is no at this point. The idea is that the script is sort of like a
>> template for running a job, and an end user can submit a custom job with
>> their own desired resource requests which will end up filling in the
>> template. I'm not in control of the Slurm cluster that will ultimately run
>> the job, nor the details of the job itself. For example, template-job.sh
>> might look like this:
>>
>> #!/bin/bash
>> srun -c 1 --mem=1k echo "Preamble"
>> srun -c <CPUs> --mem=<Memory>m /bin/sh -c <user's shell script>
>> srun -c 1 --mem=1k echo "Postamble"
>>
>> My goal is that even if the user requests 10 CPUs when the cluster only
>> has 4 available, the Preamble and Postamble steps will always run. But as I
>> said, it seems like that's not possible since the maximum number of CPUs
>> needs to be set on the sbatch allocation and the whole job would be
>> rejected on the basis that too many CPUs were requested. Is that correct?
>>
>> On Tue, May 5, 2020, 11:13 PM Chris Samuel <chris at csamuel.org> wrote:
>>
>>> On Tuesday, 5 May 2020 11:00:27 PM PDT Maria Semple wrote:
>>>
>>> > Is there no way to achieve what I want then? I'd like the first and
>>> last job
>>> > steps to always be able to run, even if the second step needs too many
>>> > resources (based on the cluster).
>>>
>>> That should just work.
>>>
>>> #!/bin/bash
>>> #SBATCH -c 2
>>> #SBATCH -n 1
>>>
>>> srun -c 1 echo hello
>>> srun -c 4 echo big wide
>>> srun -c 1 echo world
>>>
>>> gives:
>>>
>>> hello
>>> srun: Job step's --cpus-per-task value exceeds that of job (4 > 2). Job
>>> step
>>> may never run.
>>> srun: error: Unable to create step for job 604659: More processors
>>> requested
>>> than permitted
>>> world
>>>
>>> > As a side note, do you know why it's not even possible to restrict the
>>> > number of resources a single step uses (i.e. set less CPUs than are
>>> > available to the full job)?
>>>
>>> My suspicion is that you've not set up Slurm to use cgroups to restrict
>>> the
>>> resources a job can use to just those requested.
>>>
>>> https://slurm.schedmd.com/cgroups.html
>>>
>>> All the best,
>>> Chris
>>> --
>>>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>>>
>>>
>>>
>>>
>>>

-- 
Thanks,
Maria
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200506/7b49a245/attachment.htm>