[slurm-users] good practices

Yair Yarom irush at cs.huji.ac.il
Mon Nov 25 15:04:49 UTC 2019


Hi,

I'm not sure what queue time limit of 10 hours is. If you can't have jobs
waiting for more than 10 hours, than it seems to be very small for 8 hours
jobs.
Generally, a few options:
a. The --dependency option (either afterok or singleton)
b. The --array option of sbatch with limit of 1 job at a time (instead of
the for loop): sbatch --array=1-20%1
c. At the end of the script of each job, call the sbatch line of the next
job (this is probably the only option if indeed I understood the queue time
limit correctly).

And indeed, srun should probably be reserved for strictly interactive jobs.

Regards,
    Yair.

On Mon, Nov 25, 2019 at 11:21 AM Nigella Sanders <nigella.sanders at gmail.com>
wrote:

>
> Hi all,
>
> I guess this is a simple matter but I still find it confusing.
>
> I have to run 20 jobs on our supercomputer.
> Each job takes about 8 hours and every one need the previous one to be
> completed.
> The queue time limit for jobs is 10 hours.
>
> So my first approach is serially launching them in a loop using srun:
>
>
> *#!/bin/bash*
> *for i in {1..20};do*
>
> *    srun  --time 08:10:00  [options]*
>
> *done*
>
> However SLURM literature keeps saying that 'srun' should be only used for
> short command line tests. So that some sysadmins would consider this a bad
> practice (see this
> <https://stackoverflow.com/questions/43767866/slurm-srun-vs-sbatch-and-their-parameters>
> ).
>
> My second approach switched to sbatch:
>
> * #!/bin/bash *
> *for i in {1..20};do*
> *    sbatch  --time 08:10:00 [options]*
>
> *    [polling to queue to see if job is done]*
> *done*
>
> But since sbatch returns the prompt I had to add code to check for job
> termination. Polling make use of sleep command and it is prone to race
> conditions so it doesn't like to sysadmins either.
>
> I guess there must be a --wait option in some recent versions of SLURM (see
> this <https://bugs.schedmd.com/show_bug.cgi?id=1685>). Not yet available
> in our system though.
>
> Is there any prefererable/canonical/friendly way to do this?
> Any thoughts would be really appreciated,
>
> Regards,
> Nigella.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191125/bfee9806/attachment.htm>


More information about the slurm-users mailing list