[slurm-users] not allocating jobs even resources are free
Brian W. Johanson
bjohanso at psc.edu
Wed Apr 29 19:15:04 UTC 2020
Navin,
Check out 'sprio', this will give show you how the job priority changes
with the weight changes you are making.
-b
On 4/29/20 5:00 AM, navin srivastava wrote:
> Thanks Daniel.
> All jobs went into run state so unable to provide the details but
> definitely will reach out later if we see similar issue.
>
> i am more interested to understand the FIFO with Fair Tree.it will be
> good if anybody provide some insight on this combination and also if
> we will enable the backfilling here how the behaviour will change.
>
> what is the role of the Fair tree here?
>
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=2
> PriorityUsageResetPeriod=DAILY
> PriorityWeightFairshare=500000
> PriorityFlags=FAIR_TREE
>
> Regards
> Navin.
>
>
>
> On Mon, Apr 27, 2020 at 9:37 PM Daniel Letai <dani at letai.org.il
> <mailto:dani at letai.org.il>> wrote:
>
> Are you sure there are enough resources available? The node is in
> mixed state, so it's configured for both partitions - it's
> possible that earlier lower priority jobs are already running thus
> blocking the later jobs, especially since it's fifo.
>
>
> It would really help if you pasted the results of:
>
> squeue
>
> sinfo
>
>
> As well as the exact sbatch line, so we can see how many resources
> per node are requested.
>
>
> On 26/04/2020 12:00:06, navin srivastava wrote:
>> Thanks Brian,
>>
>> As suggested i gone through document and what i understood that
>> the fair tree leads to the Fairshare mechanism and based on that
>> the job should be scheduling.
>>
>> so it mean job scheduling will be based on FIFO but priority will
>> be decided on the Fairshare. i am not sure if both conflicts
>> here.if i see the normal jobs priority is lower than the GPUsmall
>> priority. so resources are available with gpusmall partition then
>> it should go. there is no job pend due to gpu resources. the gpu
>> resources itself not asked with the job.
>>
>> is there any article where i can see how the fairshare works and
>> which are setting should not be conflict with this.
>> According to document it never says that if fair-share is applied
>> then FIFO should be disabled.
>>
>> Regards
>> Navin.
>>
>>
>>
>>
>>
>> On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson
>> <bjohanso at psc.edu <mailto:bjohanso at psc.edu>> wrote:
>>
>>
>> If you haven't looked at the man page for slurm.conf, it will
>> answer most if not all your questions.
>> https://slurm.schedmd.com/slurm.conf.html but I would depend
>> on the the manual version that was distributed with the
>> version you have installed as options do change.
>>
>> There is a ton of information that is tedious to get through
>> but reading through it multiple times opens many doors.
>>
>> DefaultTime is listed in there as a Partition option.
>> If you are scheduling gres/gpu resources, it's quite possible
>> there are cores available with no corresponding gpus avail.
>>
>> -b
>>
>> On 4/24/20 2:49 PM, navin srivastava wrote:
>>> Thanks Brian.
>>>
>>> I need to check the jobs order.
>>>
>>> Is there any way to define the default timeline of the job
>>> if user not specifying time limit.
>>>
>>> Also what does the meaning of fairtree in priorities in
>>> slurm.Conf file.
>>>
>>> The set of nodes are different in partitions.FIFO does not
>>> care for any partitiong.
>>> Is it like strict odering means the job came 1st will go and
>>> until it runs it will not allow others.
>>>
>>> Also priorities is high for gpusmall partition and low for
>>> normal jobs and the nodes of the normal partition is full
>>> but gpusmall cores are available.
>>>
>>> Regards
>>> Navin
>>>
>>> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson
>>> <bjohanso at psc.edu <mailto:bjohanso at psc.edu>> wrote:
>>>
>>> Without seeing the jobs in your queue, I would expect
>>> the next job in FIFO order to be too large to fit in the
>>> current idle resources.
>>>
>>> Configure it to use the backfill scheduler:
>>> SchedulerType=sched/backfill
>>>
>>> SchedulerType
>>> Identifies the type of scheduler to be
>>> used. Note the slurmctld daemon must be restarted for a
>>> change in scheduler type to become effective
>>> (reconfiguring a running daemon has no effect for this
>>> parameter). The scontrol command can be used to
>>> manually change job priorities if desired. Acceptable
>>> values include:
>>>
>>> sched/backfill
>>> For a backfill scheduling module to
>>> augment the default FIFO scheduling. Backfill
>>> scheduling will initiate lower-priority jobs if doing so
>>> does not delay the expected initiation time of any
>>> higher priority job. Effectiveness of backfill
>>> scheduling is dependent upon users specifying job time
>>> limits, otherwise all jobs will have the same time limit
>>> and backfilling is impossible. Note documentation for
>>> the SchedulerParameters option above. This is the
>>> default configuration.
>>>
>>> sched/builtin
>>> This is the FIFO scheduler which
>>> initiates jobs in priority order. If any job in the
>>> partition can not be scheduled, no lower priority job in
>>> that partition will be scheduled. An exception is made
>>> for jobs that can not run due to partition constraints
>>> (e.g. the time limit) or down/drained nodes. In that
>>> case, lower priority jobs can be initiated and not
>>> impact the higher priority job.
>>>
>>>
>>>
>>> Your partitions are set with maxtime=INFINITE, if your
>>> users are not specifying a reasonable timelimit to their
>>> jobs, this won't help either.
>>>
>>>
>>> -b
>>>
>>>
>>> On 4/24/20 1:52 PM, navin srivastava wrote:
>>>> In addition to the above when i see the sprio of both
>>>> the jobs it says :-
>>>>
>>>> for normal queue jobs all jobs showing the same priority
>>>>
>>>> JOBID PARTITION PRIORITY FAIRSHARE
>>>> 1291352 normal 15789 15789
>>>>
>>>> for GPUsmall all jobs showing the same priority.
>>>>
>>>> JOBID PARTITION PRIORITY FAIRSHARE
>>>> 1291339 GPUsmall 21052 21053
>>>>
>>>> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava
>>>> <navin.altair at gmail.com
>>>> <mailto:navin.altair at gmail.com>> wrote:
>>>>
>>>> Hi Team,
>>>>
>>>> we are facing some issue in our environment. The
>>>> resources are free but job is going into the QUEUE
>>>> state but not running.
>>>>
>>>> i have attached the slurm.conf file here.
>>>>
>>>> scenario:-
>>>>
>>>> There are job only in the 2 partitions:
>>>> 344 jobs are in PD state in normal partition and
>>>> the node belongs from the normal partitions are
>>>> full and no more job can run.
>>>>
>>>> 1300 JOBS are in GPUsmall partition are in queue
>>>> and enough CPU is avaiable to execute the jobs but
>>>> i see the jobs are not scheduling on free nodes.
>>>>
>>>> Rest there are no pend jobs in any other partition .
>>>> eg:-
>>>> node status:- node18
>>>>
>>>> NodeName=node18 Arch=x86_64 CoresPerSocket=18
>>>> CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
>>>> AvailableFeatures=K2200
>>>> ActiveFeatures=K2200
>>>> Gres=gpu:2
>>>> NodeAddr=node18 NodeHostName=node18 Version=17.11
>>>> OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17
>>>> 07:44:50 UTC 2018 (0b375e4)
>>>> RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2
>>>> Boards=1
>>>> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
>>>> Owner=N/A MCS_label=N/A
>>>> Partitions=GPUsmall,pm_shared
>>>> BootTime=2019-12-10T14:16:37
>>>> SlurmdStartTime=2019-12-10T14:24:08
>>>> CfgTRES=cpu=36,mem=1M,billing=36
>>>> AllocTRES=cpu=6
>>>> CapWatts=n/a
>>>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>>> ExtSensorsJoules=n/s ExtSensorsWatts=0
>>>> ExtSensorsTemp=n/s
>>>>
>>>> node19:-
>>>>
>>>> NodeName=node19 Arch=x86_64 CoresPerSocket=18
>>>> CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
>>>> AvailableFeatures=K2200
>>>> ActiveFeatures=K2200
>>>> Gres=gpu:2
>>>> NodeAddr=node19 NodeHostName=node19 Version=17.11
>>>> OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31
>>>> 12:25:04 UTC 2018 (3090901)
>>>> RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2
>>>> Boards=1
>>>> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
>>>> Owner=N/A MCS_label=N/A
>>>> Partitions=GPUsmall,pm_shared
>>>> BootTime=2020-03-12T06:51:54
>>>> SlurmdStartTime=2020-03-12T06:53:14
>>>> CfgTRES=cpu=36,mem=1M,billing=36
>>>> AllocTRES=cpu=16
>>>> CapWatts=n/a
>>>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>>> ExtSensorsJoules=n/s ExtSensorsWatts=0
>>>> ExtSensorsTemp=n/s
>>>>
>>>> could you please help me to understand what could
>>>> be the reason?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
> --
> Regards,
>
> Daniel Letai
> +972 (0)505 870 456
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200429/047fc499/attachment-0001.htm>
More information about the slurm-users
mailing list