[slurm-users] Advice on using GrpTRESRunMin=cpu=<limit>

Wed Feb 12 16:45:32 UTC 2020

Hello,

Before implementing "GrpTRESRunMin=cpu=limit" on our production cluster I'm doing some tests on the development cluster. I've only get a handful of compute nodes to play without and so I have set the limit sensibly low. That is, I've set the limit to be 576,000. That's equivalent to 400 CPU-days. In other words, I can potentially submit the following job...

1 x 2 nodes x 80 cpus/node x 2.5 days = 400 CPU-days

I submitted a set of jobs requesting 2 nodes, 80 cpus/node for 2.5 days. The first day is running and the rest are in the queue -- what I see makes sense...

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
677     debug    myjob     djb1 PD       0:00      2 (AssocGrpCPURunMinutesLimit)
678     debug    myjob     djb1 PD       0:00      2 (AssocGrpCPURunMinutesLimit)
679     debug    myjob     djb1 PD       0:00      2 (AssocGrpCPURunMinutesLimit)
676     debug    myjob     djb1  R      12:52      2 navy[54-55]

On the other hand, I expected these jobs not to accrue priority, however they do appear to be (see sprio below). I'm working with Slurm v19.05.2. Have I missed something vital/important in the config? We hoped that the queued jobs would not accrue priority. We haven't, for example, used "accrue always". Have I got that wrong? Could someone please advise us.

Best regards,
David

[root at navy51 slurm]# sprio
          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE        QOS
            677 debug        5551643     100000       1644     450000    5000000          0
            678 debug        5551643     100000       1644     450000    5000000          0
            679 debug        5551642     100000       1643     450000    5000000          0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200212/bb2a82a0/attachment-0001.htm>