[slurm-users] Slurm configuration, Weight Parameter

Sistemas NLHPC sistemas at nlhpc.cl
Thu Dec 5 18:00:59 UTC 2019


Thanks Jeff !

We upgrade slurm to 18.08.4 and now work with Weight !  but the parameter
its possible running with plugin priority/multifactor ?

Thanks in advance

Regards

El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S (<JSarlo at central.uh.edu>)
escribió:

> Which version of slurm are you using?  I know in the early versions of
> 18.08 prior to 18.08.04 there was a bug with weights not working.  Once we
> got past 18.08.04,  then weights worked for us.
>
>
>
> Jeff
>
> University of Houston - HPC
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *Sistemas NLHPC
> *Sent:* Tuesday, December 03, 2019 12:33 PM
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] Slurm configuration, Weight Parameter
>
>
>
> Hi Renfro
>
>
>
> I am testing this configuration, test configuration and as clean as
> possible:
>
>
>
> ====
>
>
>
> NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle
> Sockets=2 CoresPerSocket=1
> NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle
> Sockets=2 CoresPerSocket=1
> NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle
> Sockets=2 CoresPerSocket=1
>
> PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes
> Shared=yes State=up
>
>
>
> ===
>
>
>
> In your config is necessary one plugin extra or parameter for option
> Weight?
>
>
>
> The configuration does not work as expected.
>
>
>
> Regards,
>
>
>
> El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael (<Renfro at tntech.edu>)
> escribió:
>
> We’ve been using that weighting scheme for a year or so, and it works as
> expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines
> like you have, but here’s our node settings and a subset of our partition
> settings.
>
> In our environment, we’d often have lots of idle cores on GPU nodes, since
> those jobs tend to be GPU-bound rather than CPU-bound. So in one of our
> interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU
> node. Additionally, we have three memory configurations in our main batch
> partition. We want to bias jobs to running on the smaller-memory nodes by
> default. And the same principle applies to our GPU partition, where the
> smaller-memory GPU nodes get jobs before the larger-memory GPU node.
>
> =====
>
> NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2
> ThreadsPerCore=1 Weight=10011 Gres=gpu:2
> NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2
> ThreadsPerCore=1 Weight=10021 Gres=gpu:2
> NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2
> ThreadsPerCore=1 Weight=10201
> NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2
> ThreadsPerCore=1 Weight=10211
> NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2
> ThreadsPerCore=1 Weight=10221
>
> PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4
> MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0
> State=UP Nodes=node[001-040],gpunode[001-004]
>
> PartitionName=batch Default=YES MinNodes=1 MaxNodes=40
> DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO
> Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=node[001-040]
>
> PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00
> MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=gpunode[001-004]
>
> =====
>
> > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC <sistemas at nlhpc.cl> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi All,
> >
> > Thanks all for your posts
> >
> > Reading the documentation of Slurm and other sites like Niflheim
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole
> Holm Nielsen) the parameter "Weight" is to assign a value to the nodes,
> with this you can have priority in the nodes. But I have not obtained
> positive results.
> >
> > Thanks in advance
> >
> > Regards
> >
> > El sáb., 23 nov. 2019 a las 14:18, Chris Samuel (<chris at csamuel.org>)
> escribió:
> > On 23/11/19 9:14 am, Chris Samuel wrote:
> >
> > > My gut instinct (and I've never tried this) is to make the 3GB nodes
> be
> > > in a separate partition that is guarded by AllowQos=3GB and have a QOS
> > > called "3GB" that uses MinTRESPerJob to require jobs to ask for more
> > > than 2GB of RAM to be allowed into the QOS.
> >
> > Of course there's nothing to stop a user requesting more memory than
> > they need to get access to these nodes, but that's a social issue not a
> > technical one. :-)
> >
> > --
> >   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191205/750ee2c4/attachment.htm>


More information about the slurm-users mailing list