[slurm-users] not allocating jobs even resources are free

Brian W. Johanson bjohanso at psc.edu
Fri Apr 24 19:17:29 UTC 2020


If you haven't looked at the man page for slurm.conf, it will answer 
most if not all your questions.
https://slurm.schedmd.com/slurm.conf.html but I would depend on the the 
manual version that was distributed with the version you have installed 
as options do change.

There is a ton of information that is tedious to get through but reading 
through it multiple times opens many doors.

DefaultTime is listed in there as a Partition option.
If you are scheduling gres/gpu resources, it's quite possible there are 
cores available with no corresponding gpus avail.

-b

On 4/24/20 2:49 PM, navin srivastava wrote:
> Thanks Brian.
>
> I need  to check the jobs order.
>
> Is there  any way to define the default timeline of the job if user  
> not specifying time limit.
>
> Also what does the meaning of fairtree  in priorities in slurm.Conf file.
>
> The set of nodes are different in partitions.FIFO  does  not care for 
> any  partitiong.
> Is it like strict odering means the job came 1st will go and until  it 
> runs it will  not allow others.
>
> Also priorities is high for gpusmall partition and low for normal jobs 
> and the nodes of the normal partition is full but gpusmall cores are 
> available.
>
> Regards
> Navin
>
> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson <bjohanso at psc.edu 
> <mailto:bjohanso at psc.edu>> wrote:
>
>     Without seeing the jobs in your queue, I would expect the next job
>     in FIFO order to be too large to fit in the current idle resources.
>
>     Configure it to use the backfill scheduler:
>     SchedulerType=sched/backfill
>
>           SchedulerType
>                   Identifies  the type of scheduler to be used.  Note
>     the slurmctld daemon must be restarted for a change in scheduler
>     type to become effective (reconfiguring a running daemon has no
>     effect for this parameter).  The scontrol command can be used to
>     manually change job priorities if desired.  Acceptable values include:
>
>                   sched/backfill
>                          For a backfill scheduling module to augment
>     the default FIFO scheduling.  Backfill scheduling will initiate
>     lower-priority jobs if doing so does not delay the expected
>     initiation time of any  higher priority  job.   Effectiveness  of 
>     backfill scheduling is dependent upon users specifying job time
>     limits, otherwise all jobs will have the same time limit and
>     backfilling is impossible.  Note documentation for the
>     SchedulerParameters option above.  This is the default configuration.
>
>                   sched/builtin
>                          This  is  the  FIFO scheduler which initiates
>     jobs in priority order.  If any job in the partition can not be
>     scheduled, no lower priority job in that partition will be
>     scheduled.  An exception is made for jobs that can not run due to
>     partition constraints (e.g. the time limit) or down/drained
>     nodes.  In that case, lower priority jobs can be initiated and not
>     impact the higher priority job.
>
>
>
>     Your partitions are set with maxtime=INFINITE, if your users are
>     not specifying a reasonable timelimit to their jobs, this won't
>     help either.
>
>
>     -b
>
>
>     On 4/24/20 1:52 PM, navin srivastava wrote:
>>     In addition to the above when i see the sprio of both the jobs it
>>     says :-
>>
>>     for normal queue jobs all jobs showing the same priority
>>
>>      JOBID PARTITION   PRIORITY  FAIRSHARE
>>             1291352 normal           15789      15789
>>
>>     for GPUsmall all jobs showing the same priority.
>>
>>      JOBID PARTITION   PRIORITY  FAIRSHARE
>>             1291339 GPUsmall      21052      21053
>>
>>     On Fri, Apr 24, 2020 at 11:14 PM navin srivastava
>>     <navin.altair at gmail.com <mailto:navin.altair at gmail.com>> wrote:
>>
>>         Hi Team,
>>
>>         we are facing some issue in our environment. The resources
>>         are free but job is going into the QUEUE state but not running.
>>
>>         i have attached the slurm.conf file here.
>>
>>         scenario:-
>>
>>         There are job only in the 2 partitions:
>>          344 jobs are in PD state in normal partition and the node
>>         belongs from the normal partitions are full and no more job
>>         can run.
>>
>>         1300 JOBS are in GPUsmall partition are in queue and enough
>>         CPU is avaiable to execute the jobs but i see the jobs are
>>         not scheduling on free nodes.
>>
>>         Rest there are no pend jobs in any other partition .
>>         eg:-
>>         node status:- node18
>>
>>         NodeName=node18 Arch=x86_64 CoresPerSocket=18
>>            CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
>>            AvailableFeatures=K2200
>>            ActiveFeatures=K2200
>>            Gres=gpu:2
>>            NodeAddr=node18 NodeHostName=node18 Version=17.11
>>            OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50
>>         UTC 2018 (0b375e4)
>>            RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1
>>            State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
>>         MCS_label=N/A
>>            Partitions=GPUsmall,pm_shared
>>            BootTime=2019-12-10T14:16:37
>>         SlurmdStartTime=2019-12-10T14:24:08
>>            CfgTRES=cpu=36,mem=1M,billing=36
>>            AllocTRES=cpu=6
>>            CapWatts=n/a
>>            CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>            ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>>         node19:-
>>
>>         NodeName=node19 Arch=x86_64 CoresPerSocket=18
>>            CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
>>            AvailableFeatures=K2200
>>            ActiveFeatures=K2200
>>            Gres=gpu:2
>>            NodeAddr=node19 NodeHostName=node19 Version=17.11
>>            OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04
>>         UTC 2018 (3090901)
>>            RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1
>>            State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
>>         MCS_label=N/A
>>            Partitions=GPUsmall,pm_shared
>>            BootTime=2020-03-12T06:51:54
>>         SlurmdStartTime=2020-03-12T06:53:14
>>            CfgTRES=cpu=36,mem=1M,billing=36
>>            AllocTRES=cpu=16
>>            CapWatts=n/a
>>            CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>            ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>>         could you please help me to understand what could be the reason?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200424/887577a8/attachment-0001.htm>


More information about the slurm-users mailing list