[slurm-users] Job canceled after reaching QOS limits for CPU time.

Zacarias Benta zacarias at lip.pt
Thu Oct 29 11:37:50 UTC 2020


Good morning everyone.


I'm having a "issue", I don't know if it is a "bug or a feature".
I've created a QOS:  "sacctmgr add qos myqos set GrpTRESMins=cpu=10 
flags=NoDecay".
I know the limit it too low, but I just wanted to give you guys an example.
Whenever a user submits a job and uses this QOS, if the job reaches the 
limit I've defined, the job is canceled and I loose and the computation 
it had done so far.
Is it possible to create a QOS/slurm setting that when the users reach 
the limit, it changes the job state to pending?
This way I can increase the limits, change the job state to Runnig so it 
can continue until it reaches completion.
I know this is a little bit odd, but I have users that have requested 
cpu time as per an agreement between our HPC center and their 
institutions. I know limits are set so they can be enforced, what I'm 
trying to prevent is for example, a person having a job running for 2 
months and at the end not having any data because they just needed a few 
more days. This could be prevented if I could grant them a couple more 
days of cpu, if the job went on to a pending state after reaching the limit.


*Cumprimentos / Best Regards,*

Zacarias Benta
INCD @ LIP - Universidade do Minho

INCD Logo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201029/7817ca76/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4356 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201029/7817ca76/attachment-0001.bin>


More information about the slurm-users mailing list