[slurm-users] Jobs stuck with BeginTime and prolog exit status 99:0

Chandler admin at genome.arizona.edu
Tue May 17 16:27:11 UTC 2022


Could you help me figure out why our jobs are stuck PD because of BeginTime? e.g:

              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              24458      defq cromwell smrtanal PD       0:00      1 (BeginTime)

# scontrol show job 24458
JobId=24458 JobName=cromwell_d72d675a_dataset_filter
    UserId=smrtanalysis(1002) GroupId=smrtanalysis(1002) MCS_label=N/A
    Priority=4294892709 Nice=0 Account=(null) QOS=normal
    JobState=PENDING Reason=BeginTime Dependency=(null)
    Requeue=1 Restarts=784 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
    SubmitTime=2022-05-17T09:23:03 EligibleTime=2022-05-17T09:25:04
    AccrueTime=2022-05-17T09:25:04
    StartTime=2022-05-17T09:25:04 EndTime=Unknown Deadline=N/A
    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-05-17T09:23:03
    Partition=defq AllocNode:Sid=EagI:2725352
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=(null)
    BatchHost=EagI
    NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=1,node=1,billing=1
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
    Features=(null) DelayBoot=00:00:00
    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
    Command=(null)
    WorkDir=/data2/pacbio/smrtlink/jobs
    StdErr=/data2/pacbio/smrtlink/jobs/cromwell-executions/pb_export_ccs/441e90d6-263b-41a5-bbbb-5009d9a346d9/call-prepare_input/prepare_input/d72d675a-4df7-4e9f-8072-e722742f48e7/call-dataset_filter/execution/stderr
    StdIn=/dev/null
    StdOut=/data2/pacbio/smrtlink/jobs/cromwell-executions/pb_export_ccs/441e90d6-263b-41a5-bbbb-5009d9a346d9/call-prepare_input/prepare_input/d72d675a-4df7-4e9f-8072-e722742f48e7/call-dataset_filter/execution/stdout
    Power=
#

/var/log/slurmctld:
[2022-05-17T09:20:44.366] Requeuing JobId=24458
[2022-05-17T09:23:03.068] backfill: Started JobId=24458 in defq on EagI
[2022-05-17T09:23:03.106] error: prolog_slurmctld JobId=24458 prolog exit status 99:0
[2022-05-17T09:23:03.114] Requeuing JobId=24458

Thanks
-- 
Chandler Sobel-Sorenson (he/him) / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu



More information about the slurm-users mailing list