[slurm-users] Help with preemtion based on licenses
Chris Samuel
chris at csamuel.org
Thu Nov 7 07:03:01 UTC 2019
On Wednesday, 6 November 2019 7:36:57 AM PST Oytun Peksel wrote:
> GPU part of the discussion is beyond my knowledge so I assumed it would be
> possible to release it.
If you simply suspend a job then the application does not exit, it will just
get stopped and so will be holding various resources and file handles open -
and that will include the GPU and the resources on it.
[...]
> After all software licenses might be the most expensive resource to utilize
> where preemption might sometimes be inevitable.
I think the thing to remember with software licensing systems is that we are
not the users or customers for that vendor, it's the ISV whose software you
are using who is their customer. So their aim is to try and ensure that the
ISV sells as many licenses for their software as possible.
If you just suspend an application that has checked licenses out and then use
some other program to make the license server think it's died and release them
then I suspect when you unsuspend it then it will be very confused as it'll
think it still has these licenses checked out but the license server won't. I
suspect that would not lead to a happy program, user or license server.
So for both GPUs and licenses I suspect you really do want either cancel or
requeue for this.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list