[slurm-users] How to check if there's a reservation

Prentice Bisbal pbisbal at pppl.gov
Fri Jun 15 15:22:26 MDT 2018


I agree. I brought it up with SchedMD after I spent almost an entire day 
trying to figure out why jobs were queued up but not running. I figured 
the reason column would say "reservation" if that was the issue. 
Instead, it provided some completely useless message, making me think 
the problem was elsewhere. When I confirmed it was reservation (with the 
help of this list/you), I wanted to break something.

Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov

On 06/15/2018 01:26 PM, Ryan Novosielski wrote:
> That’s great news — this is is a vFAQ at our site.
>
>> On Jun 13, 2018, at 1:37 PM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>
>> Just to revisit this, for jobs that are queued, but prevented from running, will have a more useful reason in 18.08, which will address one of my issues with reservation collisions.
>> https://bugs.schedmd.com/show_bug.cgi?id=5138
>> https://bugs.schedmd.com/show_bug.cgi?id=4987
>>
>> Prentice Bisbal
>> Lead Software Engineer
>> Princeton Plasma Physics Laboratory
>>
>> http://www.pppl.gov
>> On 05/11/2018 10:36 AM, Douglas Jacobsen wrote:
>>> A feature that many slurm users might like is sbatch --time-min.  Using both --time-min and --time a user can specify the range of acceptable wall times limits.  This can make it much easier to keep jobs running right  up to the maintenance reservation.  e.g.:
>>>
>>> sbatch --time-min=30:00 --time=48:00:00 script.sh
>>>
>>> would allow the job to schedule for any time-slot between 30 minutes and 2 days in length.  If the user has some mechanism for job chaining or similar, this can allow them to make the most of backfill opportunities.
>>>
>>> -Doug
>>>
>>> ----
>>> Doug Jacobsen, Ph.D.
>>> NERSC Computer Systems Engineer
>>> National Energy Research Scientific Computing Center
>>> dmjacobsen at lbl.gov
>>>
>>> ------------- __o
>>> ---------- _ '\<,_
>>> ----------(_)/  (_)__________________________
>>>
>>>
>>>
>>> On Fri, May 11, 2018 at 7:27 AM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>>> In the past we used the LUA job submit plugin to block jobs that would
>>> intersect maintenance reservations.  I would look at that.
>>>
>>> -Paul Edmon-
>>>
>>>
>>> On 05/11/2018 08:19 AM, Bill Wichser wrote:
>>>> The problem is that reservations can be in there yet have no effect on
>>>> the submitted job if they would run before the reservation takes
>>>> place. One can pull the starting time simply using something like this
>>>>
>>>> scontrol show res -o | awk '{print $2}'
>>>>
>>>> with output
>>>>
>>>> StartTime=2018-06-12T06:00:00
>>>> StartTime=2018-06-12T06:00:00
>>>>
>>>> You'd need more code around that, obviously, to determine if this
>>>> starttime might hold up the job.
>>>>
>>>> Bill
>>>>
>>>>
>>>> On 05/10/2018 04:23 PM, Prentice Bisbal wrote:
>>>>> Dear Slurm Users,
>>>>>
>>>>> We've started using maintenance reservations. As you would expect,
>>>>> this caused some confusion for users who were wondering why their
>>>>> jobs were queuing up and not running. Some of my users provide a
>>>>> public service of sorts that automatically submits jobs to our
>>>>> cluster. They would like to have their submission framework
>>>>> automatically detect if there's a reservation that may interfere with
>>>>> their jobs, and act accordingly.
>>>>>
>>>>> What is the best way to do this? Typically, in my shell scripts, I
>>>>> have some command that tests something, and then check exit code
>>>>> returned by the command. For example to check if my name is in file
>>>>> 'foo.txt', I'd do something like this:
>>>>>
>>>>> grep -iq prentice foo.txt
>>>>> retval=$?
>>>>> if [ $retval -eq 0 ]; then
>>>>>       echo "Prentice found"
>>>>> else
>>>>>       echo "Prentice not found"
>>>>> fi
>>>>> unset retval
>>>>>
>>>>> Or something like that. I was also thinking this might work, too:
>>>>>
>>>>> num_res=$(scontrol -o show res  | wc -l)
>>>>> if [ $num_res -eq 0 ]; then
>>>>>       echo "No reservations found"
>>>>> else
>>>>>       echo "$num_res reservation(s) found"
>>>>> fi
>>>>>
>>>>> Are there any better or other ways that you would recommend? Also, if
>>>>> there's more than one, is are they listed in any kind of order in the
>>>>> scontrol or sinfo output (soonest first, soonest last, etc.)? From
>>>>> the man page, it looks like 'scontrol show reservation' doesn't
>>>>> provide any sorting.
>>>>>
>>>>> Prentice
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>




More information about the slurm-users mailing list