[slurm-users] Help with preemtion based on licenses
Oytun Peksel
Oytun.Peksel at semcon.com
Thu Dec 5 09:24:10 UTC 2019
Hi all,
It took me a while but I think I achieved what I have been trying to achieve. I had to modify cons_res plugin to achieve the result. I forked 19.05.2 and modified the code and recompiled it. It works for me. You have to define licenses as generic resources (gres) and set --gres-flags=disable-binding. The result is slurm releases all gres resources of jobs that is preempted by suspension as well.
I tried to modify cons_tres as well but for some reason it does not work. I will try to figure that out if I have time in the future.
You can find the modified version at:
https://github.com/baytuni/slurm.git
Oytun Peksel
oytun.peksel at semcon.com
Mobile +46739205917
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Oytun Peksel
Sent: den 7 november 2019 08:48
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Help with preemtion based on licenses
Thank you all for your input.
Being a newbie in this, my impression from what you guys write is for most commercial software suspend/release_license/reacquire mechanism is not feasible.
(Answer to Mark)
What we are using here is an engineering software called abaqus. In abaqus you can use token based licenses which depend on number of cores used (and some other things). It checks out the license on submission from a flex license server, and if it gets suspended it releases the licenses. Then another instance can use the released tokens. If the initially suspended instance somehow resumed then it cannot start unless there are enough tokens.
I have had no problems with this mechanism really. It works pretty well if I do not attempt to track licenses with slurm.
I claim: since Slurm doesn't really integrate with license servers and it is pretty much up to admin, it should not assume that all licenses are not releasable.
Another thing puzzles me is :
AccountingStorageTRES=license/someSoftware
I would expect this to track the licenses defined either in slurm.conf or in sacctmgr. But it does not.
When I do :
scontrol show job
it does not show any licenses in the output:
TRES=cpu=23,mem=23G,node=1,billing=23
Or sacct --format=tres
Shows just the default trackable resources.
Oytun Peksel
oytun.peksel at semcon.com
Mobile +46739205917
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Chris Samuel
Sent: den 7 november 2019 08:03
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Help with preemtion based on licenses
On Wednesday, 6 November 2019 7:36:57 AM PST Oytun Peksel wrote:
> GPU part of the discussion is beyond my knowledge so I assumed it
> would be possible to release it.
If you simply suspend a job then the application does not exit, it will just get stopped and so will be holding various resources and file handles open - and that will include the GPU and the resources on it.
[...]
> After all software licenses might be the most expensive resource to
> utilize where preemption might sometimes be inevitable.
I think the thing to remember with software licensing systems is that we are not the users or customers for that vendor, it's the ISV whose software you are using who is their customer. So their aim is to try and ensure that the ISV sells as many licenses for their software as possible.
If you just suspend an application that has checked licenses out and then use some other program to make the license server think it's died and release them then I suspect when you unsuspend it then it will be very confused as it'll think it still has these licenses checked out but the license server won't. I suspect that would not lead to a happy program, user or license server.
So for both GPUs and licenses I suspect you really do want either cancel or requeue for this.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
When you communicate with us or otherwise interact with Semcon, we will process personal data that you provide to us or we collect about you, please read more in our Privacy Policy<https://semcon.com/data-privacy-policy/>.
More information about the slurm-users
mailing list