Good sleuthing.

It would be nice if Slurm would say something like Reason=Priority_Lower_Than_Job_XXXX so people will immediately find the culprit in such situations. Has anybody with a SchedMD subscription ever asked something like that, or is there some reasons for which it'd be impossible (or too hard) information to gather programmatically?

On Tue, Dec 10, 2024 at 1:09 AM Diego Zuccato via slurm-users <slurm-users@lists.schedmd.com> wrote:
Found the problem: another job was blocking access to the reservation.
The strangest thing is that the node (gpu03) has always been reserved
for a project, the blocking job did not explicitly request it (and even
if it did, it would have been denied access) but its state was:
    JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:gpu03
Dependency=(null)

Paint me surprised...

Diego

Il 07/12/2024 10:03, Diego Zuccato via slurm-users ha scritto:
> Ciao Davide.
>
> Il 06/12/2024 16:42, Davide DelVento ha scritto:
>
>> I find it extremely hard to understand situations like this. I wish
>> Slurm were more clear on how it reported what it is doing, but I
>> digress...
> I agree. A "scontrol explain" command could be really useful to pinpont
> the cause :)
>
>> I suspect that there are other job(s) which have higher priority than
>> this one which are supposed to run on that node but cannot start
>> because maybe this/these high-priority jobs(s) need(s) several nodes
>> and the other nodes are not available at the moment?
> That partition is a single node, and it's IDLE. If another job needed
> it, it would be in PLANNED state (IIRC).
>
>> Pure speculation, obviously, since I have no idea what the rest of
>> your cluster looks like, and what the rest of the workflow is, but the
>> clue/ hint is
>>
>>  > JobState=PENDING Reason=Priority Dependency=(null)
>>
>> You are pending because something else has higher priority. Going back
>> to my first sentence, I wish Slurm would say which one other job
>> (maybe there are more than one, but one would suffice for this
>> investigation) is trumping this job priority so one could more
>> clearly understand what is going on, without sleuthing.
> Couldn't agree more :) Scheduler is quite opaque in its decisions. :(
>
> Actually the job that the user submitted is not starting and has
> Reason=PartitionConfig . But QoS 'debug' (the one I'm using for testing)
> does have higher priority (1000) than QoS 'long' (10, IIRC).
>
> Diego
>
>> On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users <slurm-
>> users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote:
>>
>>     Hello all.
>>     An user reported that a job wasn't starting, so I tried to replicate
>>     the
>>     request and I get:
>>     -8<--
>>     [root@ophfe1 root.old]# scontrol show job 113936
>>     JobId=113936 JobName=test.sh
>>          UserId=root(0) GroupId=root(0) MCS_label=N/A
>>          Priority=1 Nice=0 Account=root QOS=long
>>          JobState=PENDING Reason=Priority Dependency=(null)
>>          Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>          RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
>>          SubmitTime=2024-12-06T13:19:36 EligibleTime=2024-12-06T13:19:36
>>          AccrueTime=2024-12-06T13:19:36
>>          StartTime=Unknown EndTime=Unknown Deadline=N/A
>>          SuspendTime=None SecsPreSuspend=0
>>     LastSchedEval=2024-12-06T13:21:32
>>     Scheduler=Backfill:*
>>          Partition=m3 AllocNode:Sid=ophfe1:855189
>>          ReqNodeList=(null) ExcNodeList=(null)
>>          NodeList=
>>          NumNodes=1-1 NumCPUs=96 NumTasks=96 CPUs/Task=1
>> ReqB:S:C:T=0:0:*:*
>>          TRES=cpu=96,mem=95000M,node=1,billing=1296
>>          Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>          MinCPUsNode=1 MinMemoryNode=95000M MinTmpDiskNode=0
>>          Features=(null) DelayBoot=00:00:00
>>          OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
>>          Command=/home/root.old/test.sh
>>          WorkDir=/home/root.old
>>          StdErr=/home/root.old/%N-%J.err
>>          StdIn=/dev/null
>>          StdOut=/home/root.old/%N-%J.out
>>          Power=
>>
>>
>>     [root@ophfe1 root.old]# scontrol sho partition m3
>>     PartitionName=m3
>>          AllowGroups=ALL DenyAccounts=formazione AllowQos=ALL
>>          AllocNodes=ALL Default=NO QoS=N/A
>>          DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
>>     Hidden=NO
>>          MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
>>     MaxCPUsPerNode=UNLIMITED
>>          Nodes=mtx20
>>          PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
>>     OverSubscribe=NO
>>          OverTimeLimit=NONE PreemptMode=CANCEL
>>          State=UP TotalCPUs=192 TotalNodes=1
>>     SelectTypeParameters=CR_SOCKET_MEMORY
>>          JobDefaults=(null)
>>          DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>>          TRES=cpu=192,mem=1150000M,node=1,billing=2592
>>          TRESBillingWeights=CPU=13.500,Mem=2.2378G
>>
>>     [root@ophfe1 root.old]# scontrol show node mtx20
>>     NodeName=mtx20 Arch=x86_64 CoresPerSocket=24
>>          CPUAlloc=0 CPUEfctv=192 CPUTot=192 CPULoad=0.00
>>          AvailableFeatures=ib,matrix,intel,avx
>>          ActiveFeatures=ib,matrix,intel,avx
>>          Gres=(null)
>>          NodeAddr=mtx20 NodeHostName=mtx20 Version=22.05.6
>>          OS=Linux 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 14:48:47
>>     UTC 2022
>>          RealMemory=1150000 AllocMem=0 FreeMem=1156606 Sockets=4 Boards=1
>>          MemSpecLimit=2048
>>          State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=8 Owner=N/A
>>     MCS_label=N/A
>>          Partitions=m3
>>          BootTime=2024-12-06T10:01:42 SlurmdStartTime=2024-12-06T10:02:54
>>          LastBusyTime=2024-12-06T10:51:58
>>          CfgTRES=cpu=192,mem=1150000M,billing=2592
>>          AllocTRES=
>>          CapWatts=n/a
>>          CurrentWatts=0 AveWatts=0
>>          ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>>     -8<--
>>
>>     So the node is free, the partition does not impose extra limits (used
>>     only for accounting factors) but the job does not start.
>>
>>     Any hints?
>>
>>     Tks
>>
>>     --     Diego Zuccato
>>     DIFA - Dip. di Fisica e Astronomia
>>     Servizi Informatici
>>     Alma Mater Studiorum - Università di Bologna
>>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>>     tel.: +39 051 20 95786
>>
>>
>>     --     slurm-users mailing list -- slurm-users@lists.schedmd.com
>>     <mailto:slurm-users@lists.schedmd.com>
>>     To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
>>     <mailto:slurm-users-leave@lists.schedmd.com>
>>
>

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com