Good to know. When I tested it (more than 10 years ago...) I couldn't make it work and the users got quite upset. So we changed to using partitions just to group omogeneous nodes, while QoSs give limits and priorities. If that's not the issue, I have no idea what else could be, sorry. Diego Il 13/04/26 14:33, Massimo Sgaravatto ha scritto:
Hi
What do you mean that you can not have jobs from two partitions running concurrently on the same node ? E.g. right now the node btc-dfa-gpu-02 is running jobs from the qst and the onlycpus-opp partitions:
sgaravat@cld-ter-ui-01 ~]$ squeue | grep btc-dfa 283558 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283559 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283560 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283561 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283562 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283563 onlycpus- myscript sgaravat R 0:10 1 btc-dfa-gpu-02 283382 qst morun_ci barone R 1-23:37:36 1 btc-dfa-gpu-02 283383 qst morun_ci barone R 1-23:37:36 1 btc-dfa-gpu-02 283388 qst morun_mv barone R 1-23:37:36 1 btc-dfa-gpu-02 283381 qst morun_ci barone R 1-23:37:37 1 btc-dfa-gpu-02
Cheers, Massimo
On Mon, Apr 13, 2026 at 2:18 PM Diego Zuccato via slurm-users <slurm- users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote:
IIRC, you can not have jobs from two partitions running concurrently on the same node, the requested resources are irrelevant. Seems a node can only be in a single partition at a time.
Diego
Il 13/04/26 13:02, Massimo Sgaravatto via slurm-users ha scritto: > Dear all > > I (try to) manage a slurm cluster composed by some CPU-only nodes and > some worker nodes which have also GPUs: > > NodeName=cld-ter-[01-06] Sockets=2 CoresPerSocket=96 ThreadsPerCore=2 > RealMemory=1536000 State=UNKNOWN > NodeName=cld-ter-gpu-[01-05] Sockets=2 CoresPerSocket=96 > ThreadsPerCore=2 Gres=gpu:nvidia-h100:4 RealMemory=1536000 State=UNKNOWN > > The GPU nodes are exposed through multiple partitions: > > > PartitionName=gpus Nodes=cld-ter-gpu-[01-02] State=UP PriorityTier=20 > PartitionName=sparch Nodes=cld-ter-gpu-03 AllowAccounts=sparch,operators > QoS=sparch State=UP PriorityTier=20 > PartitionName=geant4 Nodes=cld-ter-gpu-03 AllowAccounts=geant4,operators > QoS=geant4 State=UP PriorityTier=20 > PartitionName=enipred Nodes=cld-ter-gpu-04 > AllowAccounts=enipred,operators QoS=enipred State=UP PriorityTier=20 > PartitionName=enipiml Nodes=cld-ter-gpu-05 > AllowAccounts=enipiml,operators QoS=enipiml State=UP PriorityTier=20 > > > > We also set a partition to allow cpu-only jobs on the GPU nodes, but > these jobs should be preempted (killed and requeued) if jobs submitted > to partitions with higher priorities require those resources: > > > > PreemptType=preempt/partition_prio > PreemptMode=REQUEUE > PartitionName=onlycpus-opp Nodes=cld-ter-gpu-[01-05],cld-dfa- gpu-06,btc- > dfa-gpu-02 State=UP PriorityTier=10 > > Now, I don't understand why this job [*] submitted on the onlycpus-opp > partition can't start running e.g. on the cld-ter-gpu-01, since it has a > lot of free resources: > > [sgaravat@cld-ter-ui-01 ~]$ scontrol show node cld-ter-gpu-01 > NodeName=cld-ter-gpu-01 Arch=x86_64 CoresPerSocket=96 > CPUAlloc=8 CPUEfctv=384 CPUTot=384 CPULoad=5.93 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=gpu:nvidia-h100:4 > NodeAddr=cld-ter-gpu-01 NodeHostName=cld-ter-gpu-01 Version=25.11.3 > OS=Linux 5.14.0-611.45.1.el9_7.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Apr > 1 05:56:53 EDT 2026 > RealMemory=1536000 AllocMem=560000 FreeMem=1192357 Sockets=2 Boards=1 > State=MIXED+PLANNED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A > MCS_label=N/A > Partitions=gpus,onlycpus-opp > BootTime=2026-04-09T10:39:35 SlurmdStartTime=2026-04-09T10:40:01 > LastBusyTime=2026-04-09T11:54:46 ResumeAfterTime=None > CfgTRES=cpu=384,mem=1500G,billing=839,gres/gpu=4,gres/ gpu:nvidia-h100=4 > AllocTRES=cpu=8,mem=560000M,gres/gpu=4,gres/gpu:nvidia-h100=4 > CurrentWatts=0 AveWatts=0 > > > I guess the "MIXED+PLANNED" is the answer, but as far as I can see only > a job (283469) is planned for this worker node: > > sgaravat@cld-ter-ui-01 ~]$ squeue --start | grep ter-gpu-01 > JOBID PARTITION NAME USER ST START_TIME > NODES SCHEDNODES NODELIST(REASON) > > 283469 gpus vllm-pod ciangott PD 2026-04-13T14:31:40 > 1 cld-ter-gpu-01 (Resources) > > But job 283469 doesn't require too many resources [**], so the 2 jobs > could run together. Why job 283534 can't start ? > Any hints ? > > Thanks, Massimo > > > > [*] > > [sgaravat@cld-ter-ui-01 ~]$ scontrol show job=283534 > JobId=283534 JobName=myscript.sh > UserId=sgaravat(5008) GroupId=tbadmin(5001) MCS_label=N/A > Priority=542954 Nice=0 Account=operators QOS=normal > JobState=RUNNING Reason=None Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=00:00:41 TimeLimit=1-00:00:00 TimeMin=N/A > SubmitTime=2026-04-13T11:10:13 EligibleTime=2026-04-13T11:10:13 > AccrueTime=2026-04-13T11:10:13 > StartTime=2026-04-13T11:58:39 EndTime=2026-04-14T11:58:39 Deadline=N/A > PreemptEligibleTime=2026-04-13T11:58:39 PreemptTime=None > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2026-04-13T11:58:39 > Scheduler=Backfill > Partition=onlycpus-opp AllocNode:Sid=cld-ter-ui-01:3035857 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=btc-dfa-gpu-02 > BatchHost=btc-dfa-gpu-02 > NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > ReqTRES=cpu=1,mem=100G,node=1,billing=26 > AllocTRES=cpu=1,mem=100G,node=1,billing=26 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=100G MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > OverSubscribe=OK Contiguous=0 Licenses=(null) LicensesAlloc=(null) > Network=(null) > Command=/shared/home/sgaravat/myscript.sh > SubmitLine=sbatch myscript.sh > WorkDir=/shared/home/sgaravat > StdErr=/shared/home/sgaravat/JOB- myscript.sh.283534.4294967294.err > StdIn=/dev/null > StdOut=/shared/home/sgaravat/JOB- myscript.sh.283534.4294967294.out > MailUser=massimo.sgaravatto@pd.infn.it <mailto:massimo.sgaravatto@pd.infn.it> > <mailto:massimo.sgaravatto@pd.infn.it <mailto:massimo.sgaravatto@pd.infn.it>> > MailType=INVALID_DEPEND,BEGIN,END,FAIL,REQUEUE,STAGE_OUT > > [**] > sgaravat@cld-ter-ui-01 ~]$ scontrol show job=283469 > JobId=283469 JobName=vllm-pod > UserId=ciangott(6054) GroupId=tbuser(6000) MCS_label=N/A > Priority=499703 Nice=0 Account=cms QOS=normal > JobState=PENDING Reason=Resources Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A > SubmitTime=2026-04-13T06:48:37 EligibleTime=2026-04-13T06:48:37 > AccrueTime=2026-04-13T06:48:37 > StartTime=2026-04-13T14:31:40 EndTime=2026-04-14T14:31:40 Deadline=N/A > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2026-04-13T11:59:48 > Scheduler=Main > Partition=gpus AllocNode:Sid=cld-ter-ui-01:3015801 > ReqNodeList=(null) ExcNodeList=(null) > NodeList= SchedNodeList=cld-ter-gpu-01 > NumNodes=1-1 NumCPUs=32 NumTasks=1 CPUs/Task=32 ReqB:S:C:T=0:0:*:* > ReqTRES=cpu=32,mem=190734M,node=1,billing=118,gres/gpu=2,gres/ > gpu:nvidia-h100=2 > AllocTRES=(null) > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=32 MinMemoryNode=190734M MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > OverSubscribe=OK Contiguous=0 Licenses=(null) LicensesAlloc=(null) > Network=(null) > Command=.interlink/jobs/default-0c0257f8-d1ea-4135- > a602-96c229ce8516/job.slurm > SubmitLine=sbatch .interlink/jobs/default-0c0257f8-d1ea-4135- > a602-96c229ce8516/job.slurm > WorkDir=/shared/home/ciangott > StdErr= > StdIn=/dev/null > StdOut=/shared/home/ciangott/.interlink/jobs/default-0c0257f8- > d1ea-4135-a602-96c229ce8516/job.out > TresPerNode=gres/gpu:nvidia-h100:2 > TresPerTask=cpu=32 > > >
-- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
-- slurm-users mailing list -- slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>
-- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786