[slurm-users] how to restrict jobs
Renfro, Michael
Renfro at tntech.edu
Wed May 6 14:13:14 UTC 2020
To make sure I’m reading this correctly, you have a software license that lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you use? That is, you could run any one of the following sets of jobs:
- four 1-node jobs,
- two 2-node jobs,
- one 1-node and one 3-node job,
- two 1-node and one 2-node jobs,
- one 4-node job,
simultaneously? And the license isn’t node-locked to specific nodes by MAC address or anything similar? But if you try to run jobs beyond what I’ve listed above, you run out of licenses, and you want those later jobs to be held until licenses are freed up?
If all of those questions have an answer of ‘yes’, I think you want the remote license part of the https://slurm.schedmd.com/licenses.html, something like:
sacctmgr add resource name=software_name count=4 percentallowed=100 server=flex_host servertype=flexlm type=license
and submit jobs with a '-L software_name:N’ flag where N is the number of nodes you want to run on.
> On May 6, 2020, at 5:33 AM, navin srivastava <navin.altair at gmail.com> wrote:
>
> Thanks Micheal.
>
> Actually one application license are based on node and we have 4 Node license( not a fix node). we have several nodes but when job lands on any 4 random nodes it runs on those nodes only. After that it fails if it goes to other nodes.
>
> can we define a custom variable and set it on the node level and when user submit it will pass that variable and then job will and onto those specific nodes?
> i do not want to create a separate partition.
>
> is there any way to achieve this by any other method?
>
> Regards
> Navin.
>
>
> Regards
> Navin.
>
> On Tue, May 5, 2020 at 7:46 PM Renfro, Michael <Renfro at tntech.edu> wrote:
> Haven’t done it yet myself, but it’s on my todo list.
>
> But I’d assume that if you use the FlexLM or RLM parts of that documentation, that Slurm would query the remote license server periodically and hold the job until the necessary licenses were available.
>
> > On May 5, 2020, at 8:37 AM, navin srivastava <navin.altair at gmail.com> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> > Thanks Michael,
> >
> > yes i have gone through but the licenses are remote license and it will be used by outside as well not only in slurm.
> > so basically i am interested to know how we can update the database dynamically to get the exact value at that point of time.
> > i mean query the license server and update the database accordingly. does slurm automatically updated the value based on usage?
> >
> >
> > Regards
> > Navin.
> >
> >
> > On Tue, May 5, 2020 at 7:00 PM Renfro, Michael <Renfro at tntech.edu> wrote:
> > Have you seen https://slurm.schedmd.com/licenses.html already? If the software is just for use inside the cluster, one Licenses= line in slurm.conf plus users submitting with the -L flag should suffice. Should be able to set that license value is 4 if it’s licensed per node and you can run up to 4 jobs simultaneously, or 4*NCPUS if it’s licensed per CPU, or 1 if it’s a single license good for one run from 1-4 nodes.
> >
> > There are also options to query a FlexLM or RLM server for license management.
> >
> > --
> > Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
> > 931 372-3601 / Tennessee Tech University
> >
> > > On May 5, 2020, at 7:54 AM, navin srivastava <navin.altair at gmail.com> wrote:
> > >
> > > Hi Team,
> > >
> > > we have an application whose licenses is limited .it scales upto 4 nodes(~80 cores).
> > > so if 4 nodes are full, in 5th node job used to get fail.
> > > we want to put a restriction so that the application can't go for the execution beyond the 4 nodes and fail it should be in queue state.
> > > i do not want to keep a separate partition to achieve this config.is there a way to achieve this scenario using some dynamic resource which can call the license variable on the fly and if it is reached it should keep the job in queue.
> > >
> > > Regards
> > > Navin.
> > >
> > >
> > >
> >
>
More information about the slurm-users
mailing list