<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">在 2021/10/12 21:21, Adam Xu 写道:<br>
    </div>
    <blockquote type="cite"
      cite="mid:ae6124e8-cef5-bfcf-9c82-4f0c14257826@adagene.com.cn">Hi
      All,
      <br>
      <br>
      OS: Rocky Linux 8.4
      <br>
      <br>
      slurm version: 20.11.7
      <br>
      <br>
      the partition's name is apollo. the node's name is apollo too. the
      node has 36 cpu cores and 8GPUs in it.
      <br>
      <br>
      partition info
      <br>
      <br>
      $ scontrol show partition apollo
      <br>
      PartitionName=apollo
      <br>
         AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
      <br>
         AllocNodes=ALL Default=NO QoS=N/A
      <br>
         DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
      GraceTime=0 Hidden=NO
      <br>
         MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
      MaxCPUsPerNode=UNLIMITED
      <br>
         Nodes=apollo
      <br>
         PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
      OverSubscribe=YES:36
      <br>
         OverTimeLimit=NONE PreemptMode=OFF
      <br>
         State=UP TotalCPUs=36 TotalNodes=1 SelectTypeParameters=NONE
      <br>
         JobDefaults=(null)
      <br>
         DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
      <br>
      <br>
      node info
      <br>
      <br>
      $ scontrol show node apollo
      <br>
      NodeName=apollo Arch=x86_64 CoresPerSocket=18
      <br>
         CPUAlloc=28 CPUTot=36 CPULoad=7.02
      <br>
         AvailableFeatures=(null)
      <br>
         ActiveFeatures=(null)
      <br>
         Gres=gpu:v100:8,mps:v100:800
      <br>
         NodeAddr=apollo NodeHostName=apollo Version=20.11.7
      <br>
         OS=Linux 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Wed Sep 15
      19:12:32 UTC 2021
      <br>
         RealMemory=1 AllocMem=0 FreeMem=47563 Sockets=2 Boards=1
      <br>
         State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
      MCS_label=N/A
      <br>
         Partitions=apollo
      <br>
         BootTime=2021-09-20T23:43:49
      SlurmdStartTime=2021-10-12T16:55:44
      <br>
         CfgTRES=cpu=36,mem=1M,billing=36
      <br>
         AllocTRES=cpu=28
      <br>
         CapWatts=n/a
      <br>
         CurrentWatts=0 AveWatts=0
      <br>
         ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
      <br>
         Comment=(null)
      <br>
      <br>
      Now I have 7 jobs running but when I submit 8th job, the status of
      the job is pending beacuse Resources.
      <br>
      <br>
      $ squeue
      <br>
                   JOBID PARTITION     NAME     USER ST       TIME NODES
      NODELIST(REASON)
      <br>
                     879    apollo    do.sh zhining_ PD       0:00 1
      (Resources)
      <br>
                     489    apollo    do.sh zhining_  R 13-12:50:45 1
      apollo
      <br>
                     490    apollo    do.sh zhining_  R 13-12:41:00 1
      apollo
      <br>
                     592    apollo runme-gp junwen_f  R 4-12:42:31 1
      apollo
      <br>
                     751    apollo runme-gp junwen_f  R 1-12:48:20 1
      apollo
      <br>
                     752    apollo runme-gp junwen_f  R 1-12:48:10 1
      apollo
      <br>
                     871    apollo runme-gp junwen_f  R    7:13:45 1
      apollo
      <br>
                     872    apollo runme-gp junwen_f  R    7:12:42 1
      apollo
      <br>
      <br>
      $ scontrol show job 879
      <br>
      JobId=879 JobName=do.sh
      <br>
         UserId=zhining_wan(1001) GroupId=zhining_wan(1001)
      MCS_label=N/A
      <br>
         Priority=4294900882 Nice=0 Account=(null) QOS=(null)
      <br>
         JobState=PENDING Reason=Resources Dependency=(null)
      <br>
         Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
      <br>
         RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
      <br>
         SubmitTime=2021-10-12T16:29:29 EligibleTime=2021-10-12T16:29:29
      <br>
         AccrueTime=2021-10-12T16:29:29
      <br>
         StartTime=2021-10-12T21:17:41 EndTime=Unknown Deadline=N/A
      <br>
         SuspendTime=None SecsPreSuspend=0
      LastSchedEval=2021-10-12T21:17:39
      <br>
         Partition=apollo AllocNode:Sid=sms:1281191
      <br>
         ReqNodeList=(null) ExcNodeList=(null)
      <br>
         NodeList=(null) SchedNodeList=apollo
      <br>
         NumNodes=1-1 NumCPUs=4 NumTasks=4 CPUs/Task=1
      ReqB:S:C:T=0:0:*:*
      <br>
         TRES=cpu=4,node=1,billing=4
      <br>
         Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
      <br>
         MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
      <br>
         Features=(null) DelayBoot=00:00:00
      <br>
         OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
      <br>
Command=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/do.sh
      <br>
WorkDir=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test
      <br>
StdErr=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/slurm-879.out
      <br>
         StdIn=/dev/null
      <br>
StdOut=/home/zhining_wan/job/2021/20210603_ctla4_double_bilayer/final_pdb_minimize/amber/nolipid/test/slurm-879.out
      <br>
         Power=
      <br>
         TresPerNode=gpu:1
      <br>
         NtasksPerTRES:0
      <br>
      <br>
      After running 7 jobs, the node has 8 cpu cores and 1 gpu left, so
      I can be sure that the remaining resources are sufficient. but why
      the job is pending with reason "Resources"?
      <br>
    </blockquote>
    <p>Some information to add:</p>
    <p>I have killed some jobs with kill instead of scancle, <span
        class="VIiyi" lang="en"><span class="JLqJ4b ChMk0b"
          data-language-for-alternatives="en"
          data-language-to-translate-into="zh-CN" data-phrase-index="0"
          data-number-of-phrases="1"><span>Could this be the cause of
            this result?</span></span></span> </p>
  </body>
</html>