[slurm-users] Need to restart slurmctld for gres jobs to start
tluchko
tluchko at protonmail.com
Fri Jun 24 18:22:44 UTC 2022
Sent with Proton Mail secure email.
------- Original Message -------
On Friday, June 3rd, 2022 at 3:07 PM, tluchko <tluchko at protonmail.com> wrote:
>
>
>
>
>
> Sent with Proton Mail secure email.
> ------- Original Message -------
> On Friday, June 3rd, 2022 at 2:51 AM, Bjørn-Helge Mevik b.h.mevik at usit.uio.no wrote:
>
>
>
> > tluchko tluchko at protonmail.com writes:
> >
> > > Jobs only sit in the queue with RESOURCES as the REASON when we
> > > include the flag --gres=bandwidth:ib. If we remove the flag, the jobs
> > > run fine. But we need the flag to ensure that we don't get a mix of IB
> > > and ethernet nodes because they fail in this case.
> >
> > This doesn't answer your real question, but couldn't you just use
> > features for ib and ethernet. Jobs wanting nodes with ib would then
> > specify --constraint=ib, etc.
>
>
> Thank you for the suggestion. I didn't know about Features. When I was looking for how to do this, I could only find an example using GRES and followed that.
>
> I've made the change and it worked fine in my basic testing. I'll have to wait and see if it continues to work for real jobs.
>
I'm just reporting back that using Features solved my problem. Jobs now start on their own when resources become available.
Thank you again,
Tyler
More information about the slurm-users
mailing list