[slurm-users] good practices
Nigella Sanders
nigella.sanders at gmail.com
Mon Nov 25 09:12:17 UTC 2019
Hi all,
I guess this is a simple matter but I still find it confusing.
I have to run 20 jobs on our supercomputer.
Each job takes about 8 hours and every one need the previous one to be
completed.
The queue time limit for jobs is 10 hours.
So my first approach is serially launching them in a loop using srun:
*#!/bin/bash*
*for i in {1..20};do*
* srun --time 08:10:00 [options]*
*done*
However SLURM literature keeps saying that 'srun' should be only used for
short command line tests. So that some sysadmins would consider this a bad
practice (see this
<https://stackoverflow.com/questions/43767866/slurm-srun-vs-sbatch-and-their-parameters>
).
My second approach switched to sbatch:
* #!/bin/bash *
*for i in {1..20};do*
* sbatch --time 08:10:00 [options]*
* [polling to queue to see if job is done]*
*done*
But since sbatch returns the prompt I had to add code to check for job
termination. Polling make use of sleep command and it is prone to race
conditions so it doesn't like to sysadmins either.
I guess there must be a --wait option in some recent versions of SLURM (see
this <https://bugs.schedmd.com/show_bug.cgi?id=1685>). Not yet available in
our system though.
Is there any prefererable/canonical/friendly way to do this?
Any thoughts would be really appreciated,
Regards,
Nigella.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191125/f00fd41a/attachment-0001.htm>
More information about the slurm-users
mailing list