[slurm-users] speed / efficiency of sacct vs. scontrol
Davide DelVento
davide.quantum at gmail.com
Mon Feb 27 15:54:32 UTC 2023
> > And if you are seeing a workflow management system causing trouble on
> > your system, probably the most sustainable way of getting this resolved
> > is to file issues or pull requests with the respective project, with
> > suggestions like the ones you made. For snakemake, a second good point
> > to currently chime in, would be the issue discussing Slurm job array
> > support: https://github.com/snakemake/snakemake/issues/301
>
> I have to disagree here. I think the onus is on the people in a given
> community to ensure that their software behaves well on the systems they
> want to use, not on the operators of those system. Those of us running
> HPC systems often have to deal with a very large range of different
> pieces of software and time and personell are limited. If some program
> used by only a subset of the users is causing disruption, then it
> already costs us time and energy to mitigate those effects. Even if I
> had the appropriate skill set, I don't see my self be writing many
> patches for workflow managers any time soon.
As someone who has worked in both roles (and to a degree still is) and
therefore can better understand the perspective from both parties, I
side more with David than with Loris here.
Yes, David wrote "or pull requests", but that's an OR.
Loris, if you know or experience a problem, it takes close to zero
time to file a bug report educating the author of the software about
the problem (or pointing them to places where they can educate
themselves). Otherwise they will never know about it, they will never
fix it, and potentially they think it's fine and will make the problem
worse. Yes, you could alternatively forbid the use of the problematic
software on the machine (I've done that on our systems), but users
with those needs will find ways to create the very same problem, and
perhaps worse, in other ways (they have done it on our system). Yes,
time is limited, and as operators of HPC systems we often don't have
the time to understand all the nuances and needs of all the users, but
that's not the point I am advocating. In fact it does seem to me that
David is putting the onus on himself and his community to make the
software behave correctly, and he is trying to educate himself about
what "correct" is like. So just give him the input he's looking for,
both here and (if and when snakemake causes troubles on your system)
by opening tickets on that repo, explaining the problem (definitely
not writing a PR for you, sorry David)
More information about the slurm-users
mailing list