[slurm-users] introduce short delay starting multiple parallel jobs with srun

John Hearns hearnsj at gmail.com
Thu Nov 9 08:39:06 MST 2017


Renat,
   I know that this is not going to be helpful.  I can understand that
perhaps if you are using NFS storage then 20(*) processes might not be able
to open files at the same time.
I would consider the following:

a) looking at your storage. This is why HPC systems have high performance
and parallel storage systems.
    You could consider isntalling a high performance storage system

b) if there is no option to get better storage, tne I ask how is this data
being accessed?
    If you have multiple compute nodes, and the data is being read only,
then consider copying the data across to TMPDIR on each compute node as a
pre-job or at the start of the job.
If the speed of access to the data is critical then you might even consider
creating a ramdisk for TMPDIR - then you might even see some nice better
performance.


20 - err that does sound a bit low...











On 9 November 2017 at 15:55, Gennaro Oliva <oliva.g at na.icar.cnr.it> wrote:

> Hi Renat,
>
> On Thu, Nov 09, 2017 at 03:46:23PM +0100, Yakupov, Renat /DZNE wrote:
> > I tried that. It doesnt even queue the job with an error:
> > sbatch: unrecognized option '--array=1-24'
> > sbatch: error: Try help for more information.
>
> what version of slurm are you using?
> Regards
> --
> Gennaro Oliva
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171109/b76abca9/attachment.html>


More information about the slurm-users mailing list