[slurm-users] Help with preemtion based on licenses

Oytun Peksel Oytun.Peksel at semcon.com
Wed Nov 6 12:32:34 UTC 2019


Ok, I found out it is possible to preempt on licenses if you define the license as a generic resource. Such as:
GresTypes=license
NodeName=SomeNode Gres=license:someSoftware:100

And submit the jobs with --gres=license:someSoftware:20

But this does not work with PreemptMode=Suspend. It would requeue or cancel the preempted job but it won't suspend it. There is an interesting paragraph in Gres Scheduling page:

"Jobs will be allocated specific generic resources as needed to satisfy the request. If the job is suspended, those resources do not become available for use by other jobs."

This does not make sense to me. If gpu is my generic resource why would it not release the gpu resources if a job is suspended?



Oytun Peksel
oytun.peksel at semcon.com
Mobile   +46739205917


-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Oytun Peksel
Sent: den 6 november 2019 09:09
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Help with preemtion based on licenses

Yes of course no one would expect the resource manager to control the job applications to release licenses.
 Sometimes licenses are released either automatically or can be done by scripts.

The desired behavior here while using  '--license someSoftware at someserver:x ' :
 if there are not enough licenses a running job should be suspended/cancelled/requeued/checkpointed and assume that licenses are released.

Namely just treat license resource as any other resource like CPU and Memory. Nothing else. Today licenses are automatically pending the job disabling preemption mechanism.

The above behavior is observed with select/cons_tres plugin and license defined as a TRES "AccountingStorageTres=license/someSoftware



Oytun Peksel
oytun.peksel at semcon.com
Mobile   +46739205917


-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Mark Hahn
Sent: den 5 november 2019 16:38
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Help with preemtion based on licenses

> The limiting factor in our cluster is licenses and I want to have high and low priority jobs where submitting a high priority job will preempt (suspend) a low priority job if all the licenses are already in use.

But what are you expecting to happen?  that Slurm will somehow release the license used by the suspended job, and then somehow reacquire the license when it is resumed?  I've never heard of that kind of thing even being offered by license managers, let alone that level of intimate integration between schedulers and license managers.

At most, a scheduler may provide a callout to query the number of free licenses, and consider a job eligible to start if its declared usage fits (gres in Slurm terms, I think).

regards, mark hahn
--
operator may differ from spokesperson.            hahn at mcmaster.ca



When you communicate with us or otherwise interact with Semcon, we will process personal data that you provide to us or we collect about you, please read more in our Privacy Policy<https://semcon.com/data-privacy-policy/>.




More information about the slurm-users mailing list