Good sleuthing.
It would be nice if Slurm would say something like Reason=Priority_Lower_Than_Job_XXXX so people will immediately find the culprit in such situations. Has anybody with a SchedMD subscription ever asked something like that, or is there some reasons for which it'd be impossible (or too hard) information to gather programmatically?
On Tue, Dec 10, 2024 at 1:09 AM Diego Zuccato via slurm-users < slurm-users@lists.schedmd.com> wrote:
Found the problem: another job was blocking access to the reservation. The strangest thing is that the node (gpu03) has always been reserved for a project, the blocking job did not explicitly request it (and even if it did, it would have been denied access) but its state was: JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:gpu03 Dependency=(null)
Paint me surprised...
Diego
Il 07/12/2024 10:03, Diego Zuccato via slurm-users ha scritto:
Ciao Davide.
Il 06/12/2024 16:42, Davide DelVento ha scritto:
I find it extremely hard to understand situations like this. I wish Slurm were more clear on how it reported what it is doing, but I digress...
I agree. A "scontrol explain" command could be really useful to pinpont the cause :)
I suspect that there are other job(s) which have higher priority than this one which are supposed to run on that node but cannot start because maybe this/these high-priority jobs(s) need(s) several nodes and the other nodes are not available at the moment?
That partition is a single node, and it's IDLE. If another job needed it, it would be in PLANNED state (IIRC).
Pure speculation, obviously, since I have no idea what the rest of your cluster looks like, and what the rest of the workflow is, but the clue/ hint is
JobState=PENDING Reason=Priority Dependency=(null)
You are pending because something else has higher priority. Going back to my first sentence, I wish Slurm would say which one other job (maybe there are more than one, but one would suffice for this investigation) is trumping this job priority so one could more clearly understand what is going on, without sleuthing.
Couldn't agree more :) Scheduler is quite opaque in its decisions. :(
Actually the job that the user submitted is not starting and has Reason=PartitionConfig . But QoS 'debug' (the one I'm using for testing) does have higher priority (1000) than QoS 'long' (10, IIRC).
Diego
On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users <slurm- users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> wrote:
Hello all. An user reported that a job wasn't starting, so I tried to replicate the request and I get: -8<-- [root@ophfe1 root.old]# scontrol show job 113936 JobId=113936 JobName=test.sh UserId=root(0) GroupId=root(0) MCS_label=N/A Priority=1 Nice=0 Account=root QOS=long JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2024-12-06T13:19:36 EligibleTime=2024-12-06T13:19:36 AccrueTime=2024-12-06T13:19:36 StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-12-06T13:21:32 Scheduler=Backfill:* Partition=m3 AllocNode:Sid=ophfe1:855189 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1-1 NumCPUs=96 NumTasks=96 CPUs/Task=1
ReqB:S:C:T=0:0:*:* TRES=cpu=96,mem=95000M,node=1,billing=1296 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=95000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/root.old/test.sh WorkDir=/home/root.old StdErr=/home/root.old/%N-%J.err StdIn=/dev/null StdOut=/home/root.old/%N-%J.out Power=
[root@ophfe1 root.old]# scontrol sho partition m3 PartitionName=m3 AllowGroups=ALL DenyAccounts=formazione AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
GraceTime=0
Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=mtx20 PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=CANCEL State=UP TotalCPUs=192 TotalNodes=1 SelectTypeParameters=CR_SOCKET_MEMORY JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED TRES=cpu=192,mem=1150000M,node=1,billing=2592 TRESBillingWeights=CPU=13.500,Mem=2.2378G [root@ophfe1 root.old]# scontrol show node mtx20 NodeName=mtx20 Arch=x86_64 CoresPerSocket=24 CPUAlloc=0 CPUEfctv=192 CPUTot=192 CPULoad=0.00 AvailableFeatures=ib,matrix,intel,avx ActiveFeatures=ib,matrix,intel,avx Gres=(null) NodeAddr=mtx20 NodeHostName=mtx20 Version=22.05.6 OS=Linux 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 14:48:47 UTC 2022 RealMemory=1150000 AllocMem=0 FreeMem=1156606 Sockets=4
Boards=1
MemSpecLimit=2048 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=8 Owner=N/A MCS_label=N/A Partitions=m3 BootTime=2024-12-06T10:01:42
SlurmdStartTime=2024-12-06T10:02:54
LastBusyTime=2024-12-06T10:51:58 CfgTRES=cpu=192,mem=1150000M,billing=2592 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s -8<-- So the node is free, the partition does not impose extra limits
(used
only for accounting factors) but the job does not start. Any hints? Tks -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 -- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>
-- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com