[slurm-users] Job canceled after reaching QOS limits for CPU time.
Zacarias Benta
zacarias at lip.pt
Thu Oct 29 11:37:50 UTC 2020
Good morning everyone.
I'm having a "issue", I don't know if it is a "bug or a feature".
I've created a QOS: "sacctmgr add qos myqos set GrpTRESMins=cpu=10
flags=NoDecay".
I know the limit it too low, but I just wanted to give you guys an example.
Whenever a user submits a job and uses this QOS, if the job reaches the
limit I've defined, the job is canceled and I loose and the computation
it had done so far.
Is it possible to create a QOS/slurm setting that when the users reach
the limit, it changes the job state to pending?
This way I can increase the limits, change the job state to Runnig so it
can continue until it reaches completion.
I know this is a little bit odd, but I have users that have requested
cpu time as per an agreement between our HPC center and their
institutions. I know limits are set so they can be enforced, what I'm
trying to prevent is for example, a person having a job running for 2
months and at the end not having any data because they just needed a few
more days. This could be prevented if I could grant them a couple more
days of cpu, if the job went on to a pending state after reaching the limit.
*Cumprimentos / Best Regards,*
Zacarias Benta
INCD @ LIP - Universidade do Minho
INCD Logo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201029/7817ca76/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4356 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201029/7817ca76/attachment-0001.bin>
More information about the slurm-users
mailing list