[slurm-users] how to restrict jobs
navin srivastava
navin.altair at gmail.com
Wed May 6 14:46:09 UTC 2020
To explain with more details.
job will be submitted based on core at any time but it will go to any
random nodes but limited to 4 Nodes only.(license having some intelligence
that it calculate the nodes and if it reached to 4 then it will not allow
any more nodes. yes it didn't depend on the no of core available on nodes.
Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3
and node4]
Again Fifth job assigned by SLURM with 4 cores on any one node
of node1, node2, node3 and node4 then license will be allowed.
Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2, node3
and node4]
Again Fifth job assigned by SLURM on node5 with 4 cores then
license will not allowed [ license not found error came in this case]
Regards
Navin.
On Wed, May 6, 2020 at 7:47 PM Renfro, Michael <Renfro at tntech.edu> wrote:
> To make sure I’m reading this correctly, you have a software license that
> lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you
> use? That is, you could run any one of the following sets of jobs:
>
> - four 1-node jobs,
> - two 2-node jobs,
> - one 1-node and one 3-node job,
> - two 1-node and one 2-node jobs,
> - one 4-node job,
>
> simultaneously? And the license isn’t node-locked to specific nodes by MAC
> address or anything similar? But if you try to run jobs beyond what I’ve
> listed above, you run out of licenses, and you want those later jobs to be
> held until licenses are freed up?
>
> If all of those questions have an answer of ‘yes’, I think you want the
> remote license part of the https://slurm.schedmd.com/licenses.html,
> something like:
>
> sacctmgr add resource name=software_name count=4 percentallowed=100
> server=flex_host servertype=flexlm type=license
>
> and submit jobs with a '-L software_name:N’ flag where N is the number of
> nodes you want to run on.
>
> > On May 6, 2020, at 5:33 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> >
> > Thanks Micheal.
> >
> > Actually one application license are based on node and we have 4 Node
> license( not a fix node). we have several nodes but when job lands on any 4
> random nodes it runs on those nodes only. After that it fails if it goes to
> other nodes.
> >
> > can we define a custom variable and set it on the node level and when
> user submit it will pass that variable and then job will and onto those
> specific nodes?
> > i do not want to create a separate partition.
> >
> > is there any way to achieve this by any other method?
> >
> > Regards
> > Navin.
> >
> >
> > Regards
> > Navin.
> >
> > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael <Renfro at tntech.edu>
> wrote:
> > Haven’t done it yet myself, but it’s on my todo list.
> >
> > But I’d assume that if you use the FlexLM or RLM parts of that
> documentation, that Slurm would query the remote license server
> periodically and hold the job until the necessary licenses were available.
> >
> > > On May 5, 2020, at 8:37 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> > >
> > > External Email Warning
> > > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > > Thanks Michael,
> > >
> > > yes i have gone through but the licenses are remote license and it
> will be used by outside as well not only in slurm.
> > > so basically i am interested to know how we can update the database
> dynamically to get the exact value at that point of time.
> > > i mean query the license server and update the database accordingly.
> does slurm automatically updated the value based on usage?
> > >
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael <Renfro at tntech.edu>
> wrote:
> > > Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in
> slurm.conf plus users submitting with the -L flag should suffice. Should be
> able to set that license value is 4 if it’s licensed per node and you can
> run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1
> if it’s a single license good for one run from 1-4 nodes.
> > >
> > > There are also options to query a FlexLM or RLM server for license
> management.
> > >
> > > --
> > > Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> > > 931 372-3601 / Tennessee Tech University
> > >
> > > > On May 5, 2020, at 7:54 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> > > >
> > > > Hi Team,
> > > >
> > > > we have an application whose licenses is limited .it scales upto 4
> nodes(~80 cores).
> > > > so if 4 nodes are full, in 5th node job used to get fail.
> > > > we want to put a restriction so that the application can't go for
> the execution beyond the 4 nodes and fail it should be in queue state.
> > > > i do not want to keep a separate partition to achieve this config.is
> there a way to achieve this scenario using some dynamic resource which can
> call the license variable on the fly and if it is reached it should keep
> the job in queue.
> > > >
> > > > Regards
> > > > Navin.
> > > >
> > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200506/1a14ce55/attachment.htm>
More information about the slurm-users
mailing list