[slurm-users] weight setting not working

Andy Leung Yin Sui moliulay at gmail.com
Wed Mar 13 02:40:55 UTC 2019


Thank you for your reply. I was running 18.08.1 and updated to
18.08.6. Everything was solved. Thank you.

On Tue, 12 Mar 2019 at 20:23, Eli V <eliventer at gmail.com> wrote:
>
> On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui <moliulay at gmail.com> wrote:
> >
> > Hi,
> >
> > I am new to slurm and want to use weight option to schedule the jobs.
> > I have some machine with same hardware configuration with GPU cards. I
> > use QoS to force user at least required 1 gpu gres when submitting
> > jobs.
> > The machine serve multiple partition.
> > What I want is consume dedicated nodes first when schedule gpu_2h
> > parition jobs by adding  weight settings.(e.g. schedule to GPU38/39
> > rather than 36/37). However, the scheduler turns out not following the
> > weight settings and schedule to 36/37 (e.g. srun -p gpu_2h).
> > All the GPU node are idle and the billing are same, did I miss
> > something? Was it some limitation if a nodes server multiple partition
> > or consume GRES?  Please advise. Thank you very much.
> >
> > Below are the setting which may help.
> > slurm.conf
> > NodeName=gpu[36-37] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> > Sockets=2  CPUs=40 CoresPerSocket=10 Weight=20
> > NodeName=gpu[38-39] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> > Sockets=2  CPUs=40 CoresPerSocket=10 Weight=1
> >
> >
> > PartitionName=gpu_2h Nodes=gpu[36-39] Default=YES MaxTime=02:00:00
> > DefaultTime=02:00:00 MaxNodes=1 State=UP AllowQos=GPU
> > PartitionName=gpu_8h Nodes=gpu[31-37] MaxTime=08:00:00
> > DefaultTime=08:00:00  MaxNodes=1 State=UP AllowQos=GPU
> >
> >
> > # sinfo -N -O nodelist,partition,gres,weight
> >
> >
> > NODELIST            PARTITION           GRES                WEIGHT
> > gpu36               gpu_2h*             gpu:titanxp:4       20
> > gpu36               gpu_8h              gpu:titanxp:4       20
> > gpu37               gpu_2h*             gpu:titanxp:4       20
> > gpu37               gpu_8h              gpu:titanxp:4       20
> > gpu38               gpu_2h*             gpu:titanxp:4       1
> > gpu39               gpu_2h*             gpu:titanxp:4       1
> >
>
> You didn't mention the version of slurm you are using. Weights are
> known to be broken in early versions of 18.08. I think it was fixed in
> 18.08.04 put you'd have to go back and read the release message to
> confirm.
>



More information about the slurm-users mailing list