[slurm-users] job stuck as pending - reason "PartitionConfig"

byron lbgpublic at gmail.com
Thu Sep 30 12:04:47 UTC 2021


Bingo!

You were right, I was asking for more cores than was available (our highmem
nodes have less than out standard nodes).  I was so convinced that the
problem was related to my upgrading the OS on those nodes that it never
crossed my mind that it was something as straightforward as that.

Thanks for your help.



On Wed, Sep 29, 2021 at 7:49 PM Paul Brunk <pbrunk at uga.edu> wrote:

> Hello Byron:
>
>
>
> I’m guessing that your job is asking for more HW than the highmem_p
>
> has in it, or more cores or RAM within a node than any of the nodes
>
> have, or something like that.  'scontrol show job 10860160' might
>
> help.  You can also look in slurmctld.log for that jobid.
>
>
>
> --
>
> Paul Brunk, system administrator
>
> Georgia Advanced Computing Resource Center
>
> Enterprise IT Svcs, the University of Georgia
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *byron
> *Sent:* Wednesday, September 29, 2021 10:35
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* [slurm-users] job stuck as pending - reason "PartitionConfig"
>
>
>
> [EXTERNAL SENDER - PROCEED CAUTIOUSLY]
>
> Hi
>
>
>
> When I try to submit a job to one of our partitions it just stay in the
> stay pending with the reason "PartitionConfig".  Can someone point me in
> the right direction for how to troubleshoot this?  I'm a bit stumpped.
>
>
>
> Some details of the setup
>
>
>
> The version is 19.05.7
>
>
>
> This is the job that is stuck in state pending
>
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>           10860160   highmem MooseBen byron PD       0:00     16
> (PartitionConfig)
>
>
>
> $ sinfo -p highmem
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> highmem      up   infinite      1  drain intel-0012
> highmem      up   infinite     19   idle intel-[0001-0011,0013-0020]
>
>
>
> The output from  scontrol show part
>
> PartitionName=highmem
>    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
>    AllocNodes=ALL Default=NO QoS=N/A
>    DefaultTime=02:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
>    MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
> MaxCPUsPerNode=UNLIMITED
>    Nodes=intel-00[01-20]
>    PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=EXCLUSIVE
>    OverTimeLimit=NONE PreemptMode=REQUEUE
>    State=UP TotalCPUs=320 TotalNodes=20 SelectTypeParameters=NONE
>    JobDefaults=(null)
>    DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210930/dba83ad1/attachment.htm>


More information about the slurm-users mailing list