[slurm-users] speed / efficiency of sacct vs. scontrol
Ümit Seren
uemit.seren at gmail.com
Mon Feb 27 16:05:12 UTC 2023
As a side note:
In Slurm 23.x a new rate limiting feature for client RPC calls was added:
(see this commit:
https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e
)
This would give operators the ability to limit the negative effect of
workflow managers on the scheduler.
On Mon, Feb 27, 2023 at 4:57 PM Davide DelVento <davide.quantum at gmail.com>
wrote:
> > > And if you are seeing a workflow management system causing trouble on
> > > your system, probably the most sustainable way of getting this resolved
> > > is to file issues or pull requests with the respective project, with
> > > suggestions like the ones you made. For snakemake, a second good point
> > > to currently chime in, would be the issue discussing Slurm job array
> > > support: https://github.com/snakemake/snakemake/issues/301
> >
> > I have to disagree here. I think the onus is on the people in a given
> > community to ensure that their software behaves well on the systems they
> > want to use, not on the operators of those system. Those of us running
> > HPC systems often have to deal with a very large range of different
> > pieces of software and time and personell are limited. If some program
> > used by only a subset of the users is causing disruption, then it
> > already costs us time and energy to mitigate those effects. Even if I
> > had the appropriate skill set, I don't see my self be writing many
> > patches for workflow managers any time soon.
>
> As someone who has worked in both roles (and to a degree still is) and
> therefore can better understand the perspective from both parties, I
> side more with David than with Loris here.
>
> Yes, David wrote "or pull requests", but that's an OR.
>
> Loris, if you know or experience a problem, it takes close to zero
> time to file a bug report educating the author of the software about
> the problem (or pointing them to places where they can educate
> themselves). Otherwise they will never know about it, they will never
> fix it, and potentially they think it's fine and will make the problem
> worse. Yes, you could alternatively forbid the use of the problematic
> software on the machine (I've done that on our systems), but users
> with those needs will find ways to create the very same problem, and
> perhaps worse, in other ways (they have done it on our system). Yes,
> time is limited, and as operators of HPC systems we often don't have
> the time to understand all the nuances and needs of all the users, but
> that's not the point I am advocating. In fact it does seem to me that
> David is putting the onus on himself and his community to make the
> software behave correctly, and he is trying to educate himself about
> what "correct" is like. So just give him the input he's looking for,
> both here and (if and when snakemake causes troubles on your system)
> by opening tickets on that repo, explaining the problem (definitely
> not writing a PR for you, sorry David)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230227/e6ea3902/attachment.htm>
More information about the slurm-users
mailing list