[slurm-users] [ext] Re: Jobs getting StartTime 3 days in the future?

Holtgrewe, Manuel manuel.holtgrewe at bihealth.de
Mon Aug 31 18:57:19 UTC 2020


Thank you for your reply.

I think I found the issue. We have only few "skylake" nodes and this job is requesting them. Thus, this user is limited to the (relatively few) Skylake generation CPU nodes.

d'oh!

--
Dr. Manuel Holtgrewe, Dipl.-Inform.
Bioinformatician
Core Unit Bioinformatics – CUBI
Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the Helmholtz Association / Charité – Universitätsmedizin Berlin

Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
Postal Address: Chariteplatz 1, 10117 Berlin

E-Mail: manuel.holtgrewe at bihealth.de
Phone: +49 30 450 543 607
Fax: +49 30 450 7 543 901
Web: cubi.bihealth.org  www.bihealth.org  www.mdc-berlin.de  www.charite.de
________________________________
From: slurm-users [slurm-users-bounces at lists.schedmd.com] on behalf of Renfro, Michael [Renfro at tntech.edu]
Sent: Monday, August 31, 2020 19:36
To: Slurm User Community List
Subject: [ext] Re: [slurm-users] Jobs getting StartTime 3 days in the future?

One pending job in this partition should have a reason of “Resources”. That job has the highest priority, and if your job below would delay the highest-priority job’s start, it’ll get pushed back like you see here.

On Aug 31, 2020, at 12:13 PM, Holtgrewe, Manuel <manuel.holtgrewe at bihealth.de> wrote:

Dear all,

I'm seeing some user's job getting a StartTime 3 days in the future although there are plenty of resources available in the the partition (and the user is well below maxTRESPU of the partition).

Attached is our slurm.conf and the dump of "sacctmgr list qos -P". I'd be grateful for any insight and happy to provide more information.

The scontrol show job output is as follows:

JobId=2902252 JobName=X
   UserId=X(X GroupId=X(X MCS_label=N/A
   Priority=796 Nice=0 Account=hpc-ag-kehr QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=23:59:00 TimeMin=N/A
   SubmitTime=2020-08-31T16:34:16 EligibleTime=2020-08-31T16:34:16
   AccrueTime=2020-08-31T16:34:16
   StartTime=2020-09-03T12:43:58 EndTime=2020-09-04T12:42:58 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-08-31T19:11:13
   Partition=medium AllocNode:Sid=med0107:7749
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=112000M,node=1,billing=16
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=7000M MinTmpDiskNode=0
   Features=skylake DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=X
   StdErr=X
   StdIn=/dev/null
   StdOut=X
   Power=
   MailUser=(null) MailType=NONE


Best wishes,
Manuel

--
Dr. Manuel Holtgrewe, Dipl.-Inform.
Bioinformatician
Core Unit Bioinformatics – CUBI
Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the Helmholtz Association / Charité – Universitätsmedizin Berlin

Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
Postal Address: Chariteplatz 1, 10117 Berlin

E-Mail: manuel.holtgrewe at bihealth.de
Phone: +49 30 450 543 607
Fax: +49 30 450 7 543 901
Web: cubi.bihealth.org  www.bihealth.org  www.mdc-berlin.de  www.charite.de
<qos.txt>
<slurm.conf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200831/12f006d9/attachment.htm>


More information about the slurm-users mailing list