[slurm-users] job is pending but resources are available

Wed Oct 13 08:48:05 UTC 2021

在 10/13/21 16:30, Ole Holm Nielsen 写道:
> On 10/13/21 9:59 AM, Adam Xu wrote:
>>
>> 在 2021/10/13 9:22, Brian Andrus 写道:
>>>
>>> Something is very odd when you have the node reporting:
>>>
>>> RealMemory=1 AllocMem=0 FreeMem=47563 Sockets=2 Boards=1
>>>
>>> What do you get when you run ‘slurmd -C’ on the node?
>>>
>> # slurmd -C
>> NodeName=apollo CPUs=36 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 
>> ThreadsPerCore=1 RealMemory=128306
>> UpTime=22-16:14:48
>
> Maybe try to replace "Boards=1 SocketsPerBoard=2" by "Sockets=2". The 
> "Boards" used to give problems in Slurm 20.02, see 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#compute-node-configuration
> but should be fixed in 20.11.
"NodeName=apollo Sockets=2 CoresPerSocket=18 ThreadsPerCore=1 
Gres=gpu:v100:8,mps:800 State=UNKNOWN" in my slurm.conf file.
>
> Your node apollo has CPUAlloc=28 CPUTot=36 which is OK, since there 
> are 8 available cores.
>
> Maybe your 8 GPUs are all in use by the 7 jobs?  Can you send your 
> gres.conf file defining GPUs?
# sinfo -O GresUsed
GRES_USED
gpu:0,mps:0

gpu:v100:7(IDX:0-6),

cat gres.conf

Name=gpu Type=v100 File=/dev/nvidia0
Name=gpu Type=v100 File=/dev/nvidia1
Name=gpu Type=v100 File=/dev/nvidia2
Name=gpu Type=v100 File=/dev/nvidia3
Name=gpu Type=v100 File=/dev/nvidia4
Name=gpu Type=v100 File=/dev/nvidia5
Name=gpu Type=v100 File=/dev/nvidia6
Name=gpu Type=v100 File=/dev/nvidia7
Name=mps Count=100 File=/dev/nvidia0
Name=mps Count=100 File=/dev/nvidia1
Name=mps Count=100 File=/dev/nvidia2
Name=mps Count=100 File=/dev/nvidia3
Name=mps Count=100 File=/dev/nvidia4
Name=mps Count=100 File=/dev/nvidia5
Name=mps Count=100 File=/dev/nvidia6
Name=mps Count=100 File=/dev/nvidia7

>
> To view the node apollo jobs, I recommend to download the "pestat" 
> command from 
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
>
> Just type "pestat -G" to view the GRES resources for each job on all 
> nodes.
Ok, I will try it later.
>
> Best regards,
> Ole
>
>
>
>>> *From: *Adam Xu <mailto:adam_xu at adagene.com.cn>
>>> *Sent: *Tuesday, October 12, 2021 6:07 PM
>>> *To: *slurm-users at lists.schedmd.com
>>> *Subject: *Re: [slurm-users] job is pending but resources are available
>>>
>>> 在 2021/10/12 21:21, Adam Xu 写道:
>>>
>>>     Hi All,
>>>
>>>     OS: Rocky Linux 8.4
>>>
>>>     slurm version: 20.11.7
>>>
>>>     the partition's name is apollo. the node's name is apollo too. the
>>>     node has 36 cpu cores and 8GPUs in it.
>>>
>>>     partition info
>>>
>>>     $ scontrol show partition apollo
>>>     PartitionName=apollo
>>>        AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>>>        AllocNodes=ALL Default=NO QoS=N/A
>>>        DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
>>>     Hidden=NO
>>>        MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
>>>     MaxCPUsPerNode=UNLIMITED
>>>        Nodes=apollo
>>>        PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
>>>     OverSubscribe=YES:36
>>>        OverTimeLimit=NONE PreemptMode=OFF
>>>        State=UP TotalCPUs=36 TotalNodes=1 SelectTypeParameters=NONE
>>>        JobDefaults=(null)
>>>        DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>>>
>>>     node info
>>>
>>>     $ scontrol show node apollo
>>>     NodeName=apollo Arch=x86_64 CoresPerSocket=18
>>>        CPUAlloc=28 CPUTot=36 CPULoad=7.02
>>>        AvailableFeatures=(null)
>>>        ActiveFeatures=(null)
>>>        Gres=gpu:v100:8,mps:v100:800
>>>        NodeAddr=apollo NodeHostName=apollo Version=20.11.7
>>>        OS=Linux 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Wed Sep 15 19:12:32
>>>     UTC 2021
>>>        RealMemory=1 AllocMem=0 FreeMem=47563 Sockets=2 Boards=1
>>>        State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
>>>     MCS_label=N/A
>>>        Partitions=apollo
>>>        BootTime=2021-09-20T23:43:49 SlurmdStartTime=2021-10-12T16:55:44
>>>        CfgTRES=cpu=36,mem=1M,billing=36
>>>        AllocTRES=cpu=28
>>>        CapWatts=n/a
>>>        CurrentWatts=0 AveWatts=0
>>>        ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>>        Comment=(null)
>>>
>>>     Now I have 7 jobs running but when I submit 8th job, the status of
>>>     the job is pending beacuse Resources.
>>>
>>>     $ squeue
>>>                  JOBID PARTITION     NAME     USER ST TIME NODES
>>>     NODELIST(REASON)
>>>                    879    apollo    do.sh zhining_ PD 0:00 1
>>>     (Resources)
>>>                    489    apollo    do.sh zhining_  R 13-12:50:45 1 
>>> apollo
>>>                    490    apollo    do.sh zhining_  R 13-12:41:00 1 
>>> apollo
>>>                    592    apollo runme-gp junwen_f  R 4-12:42:31 1 
>>> apollo
>>>                    751    apollo runme-gp junwen_f  R 1-12:48:20 1 
>>> apollo
>>>                    752    apollo runme-gp junwen_f  R 1-12:48:10 1 
>>> apollo
>>>                    871    apollo runme-gp junwen_f  R 7:13:45 1 apollo
>>>                    872    apollo runme-gp junwen_f  R 7:12:42 1 apollo
>>>
>>>     $ scontrol show job 879
>>>     JobId=879 JobName=do.sh
>>>        UserId=zhining_wan(1001) GroupId=zhining_wan(1001) MCS_label=N/A
>>>        Priority=4294900882 Nice=0 Account=(null) QOS=(null)
>>>        JobState=PENDING Reason=Resources Dependency=(null)
>>>        Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>>        RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>>        SubmitTime=2021-10-12T16:29:29 EligibleTime=2021-10-12T16:29:29
>>>        AccrueTime=2021-10-12T16:29:29
>>>        StartTime=2021-10-12T21:17:41 EndTime=Unknown Deadline=N/A
>>>        SuspendTime=None SecsPreSuspend=0 
>>> LastSchedEval=2021-10-12T21:17:39
>>>        Partition=apollo AllocNode:Sid=sms:1281191
>>>        ReqNodeList=(null) ExcNodeList=(null)
>>>        NodeList=(null) SchedNodeList=apollo
>>>        NumNodes=1-1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>>        TRES=cpu=4,node=1,billing=4
>>>        Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>>        MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>        Features=(null) DelayBoot=00:00:00
>>>        OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
>>> Command=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/do.sh
>>>
>>> WorkDir=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test
>>>
>>> StdErr=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/slurm-879.out
>>>
>>>        StdIn=/dev/null
>>> StdOut=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/slurm-879.out
>>>
>>>        Power=
>>>        TresPerNode=gpu:1
>>>        NtasksPerTRES:0
>>>
>>>     After running 7 jobs, the node has 8 cpu cores and 1 gpu left, so I
>>>     can be sure that the remaining resources are sufficient. but why 
>>> the
>>>     job is pending with reason "Resources"?
>>>
>>> Some information to add：
>>>
>>> I have killed some jobs with kill instead of scancle, Could this be 
>>> the cause of this result?
>
>