[slurm-users] jobs stuck in ReqNodeNotAvail,

Christian Anthon anthon at rth.dk
Wed Nov 29 14:14:53 MST 2017


Thanks,

I believe the user must have resubmitted the job, hence the updated id.

Cheers, Christian

JobId=6986 JobName=Morgens
   UserId=ferro(2166) GroupId=ferro(22166) MCS_label=N/A
   Priority=1031 Nice=0 Account=rth QOS=normal
   JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:
Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2017-11-29T21:02:38 EligibleTime=2017-11-29T21:02:38
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=panic AllocNode:Sid=rnai01:5765
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=32000,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=16 MinMemoryCPU=2000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)



> Can you give us the output of
> # control show job 6982
>
> Could be an issue with requesting too many CPUs or something…
>
>
> Merlin
> --
> Merlin Hartley
> Computer Officer
> MRC Mitochondrial Biology Unit
> Cambridge, CB2 0XY
> United Kingdom
>
>> On 29 Nov 2017, at 15:21, Christian Anthon <anthon at rth.dk> wrote:
>>
>> Hi,
>>
>> I have a problem with a newly setup slurm-17.02.7-1.el6.x86_64 that jobs
>> seems to be stuck in ReqNodeNotAvail:
>>
>>               6982     panic  Morgens    ferro PD       0:00 1
>> (ReqNodeNotAvail, UnavailableNodes:)
>>               6981     panic     SPEC    ferro PD       0:00 1
>> (ReqNodeNotAvail, UnavailableNodes:)
>>
>> The nodes are fully allocated in terms of memory, but not all cpu
>> resources are consumed
>>
>> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>> _default     up   infinite     19    mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> _default     up   infinite     11  alloc alone[02-08,10-13]
>> fastlane     up   infinite     19    mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> fastlane     up   infinite     11  alloc alone[02-08,10-13]
>> panic        up   infinite     19    mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> panic        up   infinite     12  alloc alone[02-08,10-13,15]
>> free*        up   infinite     19    mix
>> clone[05-11,25-29,31-32,36-37,39-40,45]
>> free*        up   infinite     11  alloc alone[02-08,10-13]
>>
>> Possibly relevant lines in slurm.conf (full slurm.conf attached)
>>
>> SchedulerType=sched/backfill
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU_Memory
>> TaskPlugin=task/none
>> FastSchedule=1
>>
>> Any advice?
>>
>> Cheers, Christian.
>>
>> <slurm.conf>
>
>




More information about the slurm-users mailing list