[slurm-users] how to restrict jobs

navin srivastava navin.altair at gmail.com
Wed May 6 15:40:54 UTC 2020


Is there no way to set or define a custom variable like at node level and
then you pass the same variable in the job request so that it will land
into those nodes only.


Regards
Navin

On Wed, May 6, 2020, 21:04 Renfro, Michael <Renfro at tntech.edu> wrote:

> Ok, then regular license accounting won’t work.
>
> Somewhat tested, but should work or at least be a starting point. Given a
> job number JOBID that’s already running with this license on one or more
> nodes:
>
>   sbatch -w $(scontrol show job JOBID | grep ' NodeList=' | cut -d= -f2)
> -N 1
>
> should start a one-node job on an available node being used by JOBID. Add
> other parameters as required for cpus-per-task, time limits, or whatever
> else is needed. If you start the larger jobs first, and let the later jobs
> fill in on idle CPUs on those nodes, it should work.
>
> > On May 6, 2020, at 9:46 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> >
> > To explain with more details.
> >
> > job will be submitted based on core at any time but it will go to any
> random nodes but limited to 4 Nodes only.(license having some intelligence
> that it calculate the nodes and if it reached to 4 then it will not allow
> any more nodes. yes it didn't depend on the no of core available on nodes.
> >
> > Case-1 if 4 jobs running with 4 cores each on 4 nodes [node1, node2,
> node3 and node4]
> >              Again Fifth job assigned by SLURM with 4 cores on any one
> node of node1, node2, node3 and node4 then license will be allowed.
> >
> > Case-2 if 4 jobs running with 4 cores each on 4 nodes [node1, node2,
> node3 and node4]
> >              Again Fifth job assigned by SLURM on node5 with 4 cores
> then license will not allowed [ license not found error came in this case]
> >
> > Regards
> > Navin.
> >
> >
> > On Wed, May 6, 2020 at 7:47 PM Renfro, Michael <Renfro at tntech.edu>
> wrote:
> > To make sure I’m reading this correctly, you have a software license
> that lets you run jobs on up to 4 nodes at once, regardless of how many
> CPUs you use? That is, you could run any one of the following sets of jobs:
> >
> > - four 1-node jobs,
> > - two 2-node jobs,
> > - one 1-node and one 3-node job,
> > - two 1-node and one 2-node jobs,
> > - one 4-node job,
> >
> > simultaneously? And the license isn’t node-locked to specific nodes by
> MAC address or anything similar? But if you try to run jobs beyond what
> I’ve listed above, you run out of licenses, and you want those later jobs
> to be held until licenses are freed up?
> >
> > If all of those questions have an answer of ‘yes’, I think you want the
> remote license part of the https://slurm.schedmd.com/licenses.html,
> something like:
> >
> >   sacctmgr add resource name=software_name count=4 percentallowed=100
> server=flex_host servertype=flexlm type=license
> >
> > and submit jobs with a '-L software_name:N’ flag where N is the number
> of nodes you want to run on.
> >
> > > On May 6, 2020, at 5:33 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> > >
> > > Thanks Micheal.
> > >
> > > Actually one application license are based on node and we have 4 Node
> license( not a fix node). we have several nodes but when job lands on any 4
> random nodes it runs on those nodes only. After that it fails if it goes to
> other nodes.
> > >
> > > can we define a custom variable and set it on the node level and when
> user submit it will pass that variable and then job will and onto those
> specific nodes?
> > > i do not want to create a separate partition.
> > >
> > > is there any way to achieve this by any other method?
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > > Regards
> > > Navin.
> > >
> > > On Tue, May 5, 2020 at 7:46 PM Renfro, Michael <Renfro at tntech.edu>
> wrote:
> > > Haven’t done it yet myself, but it’s on my todo list.
> > >
> > > But I’d assume that if you use the FlexLM or RLM parts of that
> documentation, that Slurm would query the remote license server
> periodically and hold the job until the necessary licenses were available.
> > >
> > > > On May 5, 2020, at 8:37 AM, navin srivastava <navin.altair at gmail.com>
> wrote:
> > > >
> > > > External Email Warning
> > > > This email originated from outside the university. Please use
> caution when opening attachments, clicking links, or responding to requests.
> > > > Thanks Michael,
> > > >
> > > > yes i have gone through but the licenses are remote license and it
> will be used by outside as well not only in slurm.
> > > > so basically i am interested to know how we can update the database
> dynamically to get the exact value at that point of time.
> > > > i mean query the license server and update the database accordingly.
> does slurm automatically updated the value based on usage?
> > > >
> > > >
> > > > Regards
> > > > Navin.
> > > >
> > > >
> > > > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael <Renfro at tntech.edu>
> wrote:
> > > > Have you seen https://slurm.schedmd.com/licenses.html already? If
> the software is just for use inside the cluster, one Licenses= line in
> slurm.conf plus users submitting with the -L flag should suffice. Should be
> able to set that license value is 4 if it’s licensed per node and you can
> run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1
> if it’s a single license good for one run from 1-4 nodes.
> > > >
> > > > There are also options to query a FlexLM or RLM server for license
> management.
> > > >
> > > > --
> > > > Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> > > > 931 372-3601     / Tennessee Tech University
> > > >
> > > > > On May 5, 2020, at 7:54 AM, navin srivastava <
> navin.altair at gmail.com> wrote:
> > > > >
> > > > > Hi Team,
> > > > >
> > > > > we have an application whose licenses is limited .it scales upto 4
> nodes(~80 cores).
> > > > > so if 4 nodes are full, in 5th node job used to get fail.
> > > > > we want to put a restriction so that the application can't go for
> the execution beyond the 4 nodes and fail it should be in queue state.
> > > > > i do not want to keep a separate partition to achieve this
> config.is there a way to achieve this scenario using some dynamic
> resource which can call the license variable on the fly and if it is
> reached it should keep the job in queue.
> > > > >
> > > > > Regards
> > > > > Navin.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200506/3d50de19/attachment.htm>


More information about the slurm-users mailing list