[slurm-users] snakemake and slurm in general

David Laehnemann david.laehnemann at hhu.de
Fri Feb 24 17:22:52 UTC 2023


Hi Loris,

thanks for all the extra infos and the pointer to the nextflow issue.
That's exactly the conversations that need to happen, to make these
systems work for everybody. From the perspective of the users who need
to manage huge analyses with incredibly complicated job dependencies.
And from the perspective of admins, who need to keep clusters running
and working for lots of users. And I for my part try to keep in contact
with "my" sys admins wherever I do work and especially when I do stuff,
where I suspect that it might be stretching the respective system...

I have started looking into slurm job arrays and snakemake job grouping
a bit more. From what I currently understand, what snakemake does with
job grouping actually achieves something similar to job arrays, but it
could even be better from Slurm's perspective: it batches groups of
snakemake jobs into one slurm job, requesting the joint resources they
need; so no need for slurm to track individual (snakemake) jobs, but
only to track the one batched job.

Also, I think a direct support for job arrays from snakemake isn't
straightforward to implement with its DAG (directed acyclic graph)
logic, but probably isn't impossible. And it seems to have been
discussed, even recently (and I think, even with a recent contribution
by you;):
https://github.com/snakemake/snakemake/issues/301

I'll try to keep revisiting this, if I can find the time.

cheers,
david


On Fri, 2023-02-24 at 08:36 +0100, Loris Bennett wrote:
> Hi David,
> 
> David Laehnemann <david.laehnemann at hhu.de> writes:
> 
> > Hi Loris,
> > 
> > I gave this a new subject, as this has nothing to do with my
> > original
> > question.
> > 
> > Maybe this is what you were looking for in the snakemake
> > documentation:
> > 
> > https://snakemake.readthedocs.io/en/latest/executing/grouping.html#job-grouping
> > 
> > You can basically bundle groups of (snakemake) jobs together and
> > snakemake will submit them as batch / array jobs to slurm.
> > 
> > However, from what I understand, this means that you will have to
> > tailor your workflow's group setup to the respective slurm cluster
> > and
> > the nodes that it has, to make sure that your array jobs fit onto
> > nodes. This is usually not very portable. And submitting individual
> > jobs is much more flexible in using available resources, because
> > this
> > will not block larger amounts of resources at once. So there's
> > always a
> > tradeoff between different optimizations, here.
> > 
> > In addition, there are very clear limits to how many jobs slurm can
> > handle in its queue, see for example this discussion:
> > https://bugs.schedmd.com/show_bug.cgi?id=2366
> > 
> > So, to me, it makes a lot of sense to use snakemake to for example
> > limit the total number of cores it can request to something
> > matching
> > the actual number of cores available on the slurm cluster.
> > Otherwise
> > the queue is quickly overwhelmed and my own (and other people's)
> > jobs
> > will simply not be scheduled, and running workflow instances will
> > abort.
> 
> We seem to be looking at this from very different perspectives.  I am
> not sure what you are referring to with "the queue is quickly
> overwhelmed" and "jobs will simply not be scheduled".  This, in my
> opinion, should not be happening whether you use snakemake or not.
> Maybe you should talk to the administrators about such issues.
> 
> > In general, Slurm has these limitations, other cluster systems have
> > others. Workflow managers have to accommodate many different
> > systems
> > and setups. At least for snakemake I know that as a result it
> > provides
> > very generalizable solutions for resource management. And I
> > actually
> > think that this is because such systems originated in research
> > environments where HPCs are quite common, so they try to interact
> > with
> > them as nicely as possible.
> > 
> > But both snakemake and nextflow are Open Source projects,
> > maintained by
> > a community, so they depend on people implementing things like
> > cluster
> > system support without getting much credit for it (apart from that
> > stuff works for them, on their local system). And then it depends
> > on
> > people with deeper knowledge of such cluster systems providing
> > their
> > help along the way, which is why I am on this list now, asking for
> > insights. So feel free to dig into the respective code bases with a
> > bit
> > of that grumpy energy, making snakemake or nextflow a bit better in
> > how
> > they deal with Slurm.
> 
> I have every sympathy for people working on Open Source projects and
> am
> very happy to offer assistance and have commented on lack of support
> for
> job arrays in Nextflow here:
>  
>   https://github.com/nextflow-io/nextflow/issues/1477
> 
> This is in fact where I learned about the potential negative impact
> of
> multiple similar job on backfilling.
> 
> Cheers,
> 
> Loris
> 
> > cheers,
> > david
> > 
> > 
> > 
> > On Thu, 2023-02-23 at 15:38 +0100, Loris Bennett wrote:
> > > Hi David,
> > > 
> > > David Laehnemann <david.laehnemann at hhu.de> writes:
> > > 
> > > [snip (16 lines)]
> > > 
> > > > P.S.: @Loris and @Noam: Exactly, snakemake is a software
> > > > distinct
> > > > from
> > > > slurm that you can use to orchestrate large analysis workflows-
> > > > --on
> > > > anything from a desktop or laptop computer to all kinds of
> > > > cluster
> > > > /
> > > > cloud systems. In the case of Slurm it will submit each
> > > > analysis
> > > > step
> > > > on a particular sample as a separate job, specifying the
> > > > resources
> > > > it
> > > > needs. The scheduler then handles it from there. But because
> > > > you
> > > > can
> > > > have (hundreds of) thousands of jobs, and with dependencies
> > > > among
> > > > them,
> > > > you can't just submit everything all at once, but have to keep
> > > > track of
> > > > where you are at. And make sure you don't submit much more than
> > > > the
> > > > system can handle at any time, so you don't overwhelm the Slurm
> > > > queue.
> > > 
> > > [snip (86 lines)]
> > > 
> > > I know what Snakemake and other workflow managers, such as
> > > Nextflow
> > > are
> > > for, but my maybe ill-informed impression is that, while
> > > something of
> > > this sort is obviously needed to manage complex dependencies, the
> > > current solutions, probably because they originated outside the
> > > HPC
> > > context, to try to do too much.  You say Snakemake helps
> > > 
> > >   make sure you don't submit much more than the system can handle
> > > 
> > > but that in my view should not be necessary.  Slurm has
> > > configuration
> > > parameters which can be set to limit the number of jobs a user
> > > can
> > > submit and/or run.  And when it comes to submitting (hundreds of)
> > > thousands of jobs, Nextflow for example currently can't create
> > > job
> > > arrays, and so generates large numbers of jobs with identical
> > > resource
> > > requirements, which can prevent backfill from working properly.
> > > Skimming the documentation for Snakemake, I also could not find
> > > any
> > > reference to Slurm job arrays, so this could also be an issue.
> > > 
> > > Just my slightly grumpy 2¢.
> > > 
> > > Cheers,
> > > 
> > > Loris
> > > 




More information about the slurm-users mailing list