[slurm-users] jobs stuck in ReqNodeNotAvail,
Christian Anthon
anthon at rth.dk
Wed Nov 29 14:14:53 MST 2017
Thanks,
I believe the user must have resubmitted the job, hence the updated id.
Cheers, Christian
JobId=6986 JobName=Morgens
UserId=ferro(2166) GroupId=ferro(22166) MCS_label=N/A
Priority=1031 Nice=0 Account=rth QOS=normal
JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:
Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2017-11-29T21:02:38 EligibleTime=2017-11-29T21:02:38
StartTime=Unknown EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=panic AllocNode:Sid=rnai01:5765
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16 ReqB:S:C:T=0:0:*:*
TRES=cpu=16,mem=32000,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
MinCPUsNode=16 MinMemoryCPU=2000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
> Can you give us the output of
> # control show job 6982
>
> Could be an issue with requesting too many CPUs or something…
>
>
> Merlin
> --
> Merlin Hartley
> Computer Officer
> MRC Mitochondrial Biology Unit
> Cambridge, CB2 0XY
> United Kingdom
>
>> On 29 Nov 2017, at 15:21, Christian Anthon <anthon at rth.dk> wrote:
>>
>> Hi,
>>
>> I have a problem with a newly setup slurm-17.02.7-1.el6.x86_64 that jobs
>> seems to be stuck in ReqNodeNotAvail:
>>
>> 6982 panic Morgens ferro PD 0:00 1
>> (ReqNodeNotAvail, UnavailableNodes:)
>> 6981 panic SPEC ferro PD 0:00 1
>> (ReqNodeNotAvail, UnavailableNodes:)
>>
>> The nodes are fully allocated in terms of memory, but not all cpu
>> resources are consumed
>>
>> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
>> _default up infinite 19 mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> _default up infinite 11 alloc alone[02-08,10-13]
>> fastlane up infinite 19 mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> fastlane up infinite 11 alloc alone[02-08,10-13]
>> panic up infinite 19 mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> panic up infinite 12 alloc alone[02-08,10-13,15]
>> free* up infinite 19 mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> free* up infinite 11 alloc alone[02-08,10-13]
>>
>> Possibly relevant lines in slurm.conf (full slurm.conf attached)
>>
>> SchedulerType=sched/backfill
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU_Memory
>> TaskPlugin=task/none
>> FastSchedule=1
>>
>> Any advice?
>>
>> Cheers, Christian.
>>
>> <slurm.conf>
>
>
More information about the slurm-users
mailing list