[slurm-users] Handling idle sessions
datakid at gmail.com
Sun May 27 04:22:12 MDT 2018
On 27 May 2018 at 18:56, Nadav Toledo <nadavtoledo at cs.technion.ac.il> wrote:
> Hey Lachlan,
> Can you specify how/where you set the walltime and which factor you use in
> the accounting system to deprioritse?
walltime is set in slurm.conf per Partition. You can set DefaultTime or
MaxTime or both. Search for those terms here
Accounting system is using FairShare/Fair Tree
PDF of presentation -> https://slurm.schedmd.com/SC14/BYU_Fair_Tree.pdf
> Thanks, Nadav
> On 27/05/2018 11:34, Lachlan Musicman wrote:
> On 27 May 2018 at 18:23, Nadav Toledo <nadavtoledo at cs.technion.ac.il>
>> Hello forum,
>> I am trying to deal with idle session for some time, and haven't found a
>> solution i am happy with.
>> The scenario is as follow: users using srun for jupyter-lab(which is fine
>> and even encouraged by me) on image processing cluster with gpus.
>> problem is, I am trying to have some kind of solution to email/cancel
>> their job if their session is idle for X amount of hours.
>> the w command or xprintidle cannot be used , since they both work with
>> ssh but not with slurm(checked that)
>> Writing a script is not as easy as one might think, If i run a script in
>> admin user scope, i need later on to figure out which idle gpu belong to
>> which slurm job.
>> running a script in the user scope is probably better idea, but in which
>> way? crontab is running even user is not logged, how can i force users to
>> run something only when the job start?
>> perhaps some combination of sreport and tres?
> Hmm. We address this with accounting. A tight walltime ( 40 minutes)
> means that most jobs run without worrying about walltime. But some will
> need to set it. The accounting system keeps people honest by making
> "hogging" of resources bad for a users job priority - in so much as their
> next job will be deprioritsed.
> Letting people know that their next job will not be de-prioritised if they
> waste the resources, we find our users behave responsibly.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users