[slurm-users] Cannot enable Gang scheduling

Fri Jan 13 11:16:11 UTC 2023

Problem might be that OverSubscribe is not enabled?  w/o it, I don't
believe the time-slicing can be GANG scheduled

Can you do a "scontrol show partition" to verify that it is?

On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel <hdaniel at ualg.pt> wrote:

> Hi,
>
> I am trying to enable gang scheduling on a server with a CPU with 32 cores
> and 4 GPUs.
>
> However, using Gang sched, the cpu jobs (or gpu jobs) are not being
> preempted after the time slice, which is set to 30 secs.
>
> Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. The
> first 2 jobs launched are never preempted. The 3rd job is forever (or at
> least until one of the other 2 ends) starving:
>
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>                313  asimov01 cpu-only  hdaniel PD       0:00      1
> (Resources)
>                311  asimov01 cpu-only  hdaniel  R       1:52      1 asimov
>                312  asimov01 cpu-only  hdaniel  R       1:49      1 asimov
>
> The same happens with GPU jobs. If I launch 5 jobs, requiring one GPU
> each, the 5th job will never run. The preemption is not working with the
> specified timeslice.
>
> I tried several combinations:
>
> SchedulerType=sched/builtin  and backfill
> SelectType=select/cons_tres   and linear
>
> I'll appreciate any help and suggestions
> The slurm.conf is below.
> Thanks
>
> ClusterName=asimov
> SlurmctldHost=localhost
> MpiDefault=none
> ProctrackType=proctrack/linuxproc # proctrack/cgroup
> ReturnToService=2
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/lib/slurm/slurmd
> SlurmUser=slurm
> StateSaveLocation=/var/lib/slurm/slurmctld
> SwitchType=switch/none
> TaskPlugin=task/none # task/cgroup
> #
> # TIMERS
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> #
> # SCHEDULING
> #FastSchedule=1 #obsolete
> SchedulerType=sched/builtin #backfill
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core    #CR_Core_Memory let's only one job run at
> a time
> PreemptType = preempt/partition_prio
> PreemptMode = SUSPEND,GANG
> SchedulerTimeSlice=30           #in seconds, default 30
> #
> # LOGGING AND ACCOUNTING
> #AccountingStoragePort=
> AccountingStorageType=accounting_storage/none
> #AccountingStorageEnforce=associations
> #ClusterName=bip-cluster
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/linux
> SlurmctldDebug=info
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
> SlurmdDebug=info
> SlurmdLogFile=/var/log/slurm/slurmd.log
> #
> #
> # COMPUTE NODES
> #NodeName=asimov CPUs=64 RealMemory=500 State=UNKNOWN
> #PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>
> # Partitions
> GresTypes=gpu
> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2
> State=UNKNOWN
> PartitionName=asimov01 Nodes=asimov Default=YES MaxTime=INFINITE
> MaxNodes=1 DefCpuPerGPU=2 State=UP
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230113/8a65693a/attachment-0002.htm>