[slurm-users] not allocating jobs even resources are free
Brian W. Johanson
bjohanso at psc.edu
Fri Apr 24 19:17:29 UTC 2020
If you haven't looked at the man page for slurm.conf, it will answer
most if not all your questions.
https://slurm.schedmd.com/slurm.conf.html but I would depend on the the
manual version that was distributed with the version you have installed
as options do change.
There is a ton of information that is tedious to get through but reading
through it multiple times opens many doors.
DefaultTime is listed in there as a Partition option.
If you are scheduling gres/gpu resources, it's quite possible there are
cores available with no corresponding gpus avail.
-b
On 4/24/20 2:49 PM, navin srivastava wrote:
> Thanks Brian.
>
> I need to check the jobs order.
>
> Is there any way to define the default timeline of the job if user
> not specifying time limit.
>
> Also what does the meaning of fairtree in priorities in slurm.Conf file.
>
> The set of nodes are different in partitions.FIFO does not care for
> any partitiong.
> Is it like strict odering means the job came 1st will go and until it
> runs it will not allow others.
>
> Also priorities is high for gpusmall partition and low for normal jobs
> and the nodes of the normal partition is full but gpusmall cores are
> available.
>
> Regards
> Navin
>
> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson <bjohanso at psc.edu
> <mailto:bjohanso at psc.edu>> wrote:
>
> Without seeing the jobs in your queue, I would expect the next job
> in FIFO order to be too large to fit in the current idle resources.
>
> Configure it to use the backfill scheduler:
> SchedulerType=sched/backfill
>
> SchedulerType
> Identifies the type of scheduler to be used. Note
> the slurmctld daemon must be restarted for a change in scheduler
> type to become effective (reconfiguring a running daemon has no
> effect for this parameter). The scontrol command can be used to
> manually change job priorities if desired. Acceptable values include:
>
> sched/backfill
> For a backfill scheduling module to augment
> the default FIFO scheduling. Backfill scheduling will initiate
> lower-priority jobs if doing so does not delay the expected
> initiation time of any higher priority job. Effectiveness of
> backfill scheduling is dependent upon users specifying job time
> limits, otherwise all jobs will have the same time limit and
> backfilling is impossible. Note documentation for the
> SchedulerParameters option above. This is the default configuration.
>
> sched/builtin
> This is the FIFO scheduler which initiates
> jobs in priority order. If any job in the partition can not be
> scheduled, no lower priority job in that partition will be
> scheduled. An exception is made for jobs that can not run due to
> partition constraints (e.g. the time limit) or down/drained
> nodes. In that case, lower priority jobs can be initiated and not
> impact the higher priority job.
>
>
>
> Your partitions are set with maxtime=INFINITE, if your users are
> not specifying a reasonable timelimit to their jobs, this won't
> help either.
>
>
> -b
>
>
> On 4/24/20 1:52 PM, navin srivastava wrote:
>> In addition to the above when i see the sprio of both the jobs it
>> says :-
>>
>> for normal queue jobs all jobs showing the same priority
>>
>> JOBID PARTITION PRIORITY FAIRSHARE
>> 1291352 normal 15789 15789
>>
>> for GPUsmall all jobs showing the same priority.
>>
>> JOBID PARTITION PRIORITY FAIRSHARE
>> 1291339 GPUsmall 21052 21053
>>
>> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava
>> <navin.altair at gmail.com <mailto:navin.altair at gmail.com>> wrote:
>>
>> Hi Team,
>>
>> we are facing some issue in our environment. The resources
>> are free but job is going into the QUEUE state but not running.
>>
>> i have attached the slurm.conf file here.
>>
>> scenario:-
>>
>> There are job only in the 2 partitions:
>> 344 jobs are in PD state in normal partition and the node
>> belongs from the normal partitions are full and no more job
>> can run.
>>
>> 1300 JOBS are in GPUsmall partition are in queue and enough
>> CPU is avaiable to execute the jobs but i see the jobs are
>> not scheduling on free nodes.
>>
>> Rest there are no pend jobs in any other partition .
>> eg:-
>> node status:- node18
>>
>> NodeName=node18 Arch=x86_64 CoresPerSocket=18
>> CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07
>> AvailableFeatures=K2200
>> ActiveFeatures=K2200
>> Gres=gpu:2
>> NodeAddr=node18 NodeHostName=node18 Version=17.11
>> OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul 17 07:44:50
>> UTC 2018 (0b375e4)
>> RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2 Boards=1
>> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
>> MCS_label=N/A
>> Partitions=GPUsmall,pm_shared
>> BootTime=2019-12-10T14:16:37
>> SlurmdStartTime=2019-12-10T14:24:08
>> CfgTRES=cpu=36,mem=1M,billing=36
>> AllocTRES=cpu=6
>> CapWatts=n/a
>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>> node19:-
>>
>> NodeName=node19 Arch=x86_64 CoresPerSocket=18
>> CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43
>> AvailableFeatures=K2200
>> ActiveFeatures=K2200
>> Gres=gpu:2
>> NodeAddr=node19 NodeHostName=node19 Version=17.11
>> OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04
>> UTC 2018 (3090901)
>> RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2 Boards=1
>> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
>> MCS_label=N/A
>> Partitions=GPUsmall,pm_shared
>> BootTime=2020-03-12T06:51:54
>> SlurmdStartTime=2020-03-12T06:53:14
>> CfgTRES=cpu=36,mem=1M,billing=36
>> AllocTRES=cpu=16
>> CapWatts=n/a
>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>> could you please help me to understand what could be the reason?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200424/887577a8/attachment-0001.htm>
More information about the slurm-users
mailing list