[slurm-users] Array Job Node Allocation
ej4 at sanger.ac.uk
Wed Mar 21 06:20:13 MDT 2018
Thanks for the suggestion. This does seem like a good way forward. I
will look into it.
On 21/03/2018 13:34, Gareth.Williams at csiro.au wrote:
> Hi Emyr,
> Perhaps you could be more explicit about the i/o boundedness and have jobs request an io gres as well as compute and memory resource. You could then set the amount of io resource per node (and maybe globally - possibly separate iolocal and ioglobal). Then you could avoid io contention locally and globally instead of just shifting the problem about and hoping that spreading load helps. Another option is to declare that there are fewer cpus per node ( which has its own problems).
> Of course, difficulties in estimating the io needs per jobs might make this whole idea broken... Mostly I wanted to point out that there are other ways of thinking about the problem - and round-robin may just shift the problem around in an ugly way.
> best wishes,
> -----Original Message-----
> From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Emyr James
> Sent: Wednesday, 21 March 2018 4:54 PM
> To: slurm-users at schedmd.com
> Subject: [slurm-users] Array Job Node Allocation
> Dear all,
> I would like to be able to have an array job load nodes with a round-robin allocation instead of what seems to be the default method of loading the first node till full before moving on to the next node. Our cluster is used for bioinformatics and jobs tend to be serial high throughput with one or a few threads on a node as opposed to jobs being distributed across nodes. The default whereby nodes are filled sequentially doesn't work well for us given that jobs tend to be i/o bound.
> I've seen the thread starting at
> https://groups.google.com/d/msg/slurm-users/uiKuFF8C-kU/mnJ1VcESBwAJ but I can't see the solution mentioned there (periodically setting node weights according to load) working for array jobs as it submits jobs in clumps.
> The LLN strategy seems to be what I'm after but as in the thread above I can't get it to work. Has anyone managed to get this working ?
> The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the slurm-users