[slurm-users] Cannot enable Gang scheduling
Helder Daniel
hdaniel at ualg.pt
Fri Jan 13 12:29:26 UTC 2023
PS: I checked the resources while running the 3 GPU jobs which where
launched with:
sbatch --gpus-per-task=2 --cpus-per-task=1 cnn-multi.sh
The server have 64 cores (32 x2 with hyperthreading)
cat /proc/cpuinfo | grep processor | tail -n1
processor : 63
128 GB main memory:
hdaniel at asimov:~/Works/Turbines/02-CNN$ cat /proc/meminfo
MemTotal: 131725276 kB
MemFree: 106773356 kB
MemAvailable: 109398780 kB
Buffers: 161012 kB
(...)
And 4 GPUs each with 16GB memory:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
|
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
| | | MIG
M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 On | 00000000:41:00.0 Off |
Off |
| 45% 63C P2 47W / 140W | 15370MiB / 16376MiB | 14%
Default |
| | |
N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A4000 On | 00000000:42:00.0 Off |
Off |
| 44% 63C P2 45W / 140W | 15370MiB / 16376MiB | 14%
Default |
| | |
N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A4000 On | 00000000:61:00.0 Off |
Off |
| 50% 68C P2 52W / 140W | 15370MiB / 16376MiB | 15%
Default |
| | |
N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A4000 On | 00000000:62:00.0 Off |
Off |
| 46% 64C P2 47W / 140W | 15370MiB / 16376MiB | 14%
Default |
| | |
N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:
|
| GPU GI CI PID Type Process name GPU
Memory |
| ID ID Usage
|
|=============================================================================|
| 0 N/A N/A 2146 G /usr/lib/xorg/Xorg
9MiB |
| 0 N/A N/A 2472 G /usr/bin/gnome-shell
4MiB |
| 0 N/A N/A 524228 C /bin/python
15352MiB |
| 1 N/A N/A 2146 G /usr/lib/xorg/Xorg
4MiB |
| 1 N/A N/A 524228 C /bin/python
15362MiB |
| 2 N/A N/A 2146 G /usr/lib/xorg/Xorg
4MiB |
| 2 N/A N/A 524226 C /bin/python
15362MiB |
| 3 N/A N/A 2146 G /usr/lib/xorg/Xorg
4MiB |
| 3 N/A N/A 524226 C /bin/python
15362MiB |
+-----------------------------------------------------------------------------+
On Fri, 13 Jan 2023 at 12:08, Helder Daniel <hdaniel at ualg.pt> wrote:
> Hi Kevin
>
> I did a "scontrol show partition".
> Oversubscribe was not enabled.
> I enable it in slurm.conf with:
>
> (...)
> GresTypes=gpu
> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2
> State=UNKNOWN
> PartitionName=asimov01 *OverSubscribe=FORCE* Nodes=asimov Default=YES
> MaxTime=INFINITE MaxNodes=1 DefCpuPerGPU=2 State=UP
>
> but now it is working only with CPU jobs. It does not preempt gpu jobs.
> Lauching 3 cpu only jobs, each requiring 32 out of 64 cores it preempt
> after the timeslice as expected
>
> sbatch --cpus-per-task=32 test-cpu.sh
>
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 352 asimov01 cpu-only hdaniel R 0:58 1 asimov
> 353 asimov01 cpu-only hdaniel R 0:25 1 asimov
> 351 asimov01 cpu-only hdaniel S 0:36 1 asimov
>
> But launching 3 GPU jobs, each requiring 2 out of 4 GPUs it does not
> preempt the first 2 that start running.
> It says that the 3rd job is hanging on resources.
>
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 356 asimov01 gpu hdaniel PD 0:00 1
> (Resources)
> 354 asimov01 gpu hdaniel R 3:05 1 asimov
> 355 asimov01 gpu hdaniel R 3:02 1 asimov
>
> Do I need to change anything else in the configuration to support also gpu
> gang scheduling?
> Thanks
>
>
> ============================================================================
> scontrol show partition asimov01
> PartitionName=asimov01
> AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
> AllocNodes=ALL Default=YES QoS=N/A
> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
> MaxNodes=1 MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
> Nodes=asimov
> PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
> OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
> State=UP TotalCPUs=64 TotalNodes=1 SelectTypeParameters=NONE
> JobDefaults=DefCpuPerGPU=2
> DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> On Fri, 13 Jan 2023 at 11:16, Kevin Broch <kbroch at rivosinc.com> wrote:
>
>> Problem might be that OverSubscribe is not enabled? w/o it, I don't
>> believe the time-slicing can be GANG scheduled
>>
>> Can you do a "scontrol show partition" to verify that it is?
>>
>> On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel <hdaniel at ualg.pt> wrote:
>>
>>> Hi,
>>>
>>> I am trying to enable gang scheduling on a server with a CPU with 32
>>> cores and 4 GPUs.
>>>
>>> However, using Gang sched, the cpu jobs (or gpu jobs) are not being
>>> preempted after the time slice, which is set to 30 secs.
>>>
>>> Below is a snapshot of squeue. There are 3 jobs each needing 32 cores.
>>> The first 2 jobs launched are never preempted. The 3rd job is forever (or
>>> at least until one of the other 2 ends) starving:
>>>
>>> JOBID PARTITION NAME USER ST TIME NODES
>>> NODELIST(REASON)
>>> 313 asimov01 cpu-only hdaniel PD 0:00 1
>>> (Resources)
>>> 311 asimov01 cpu-only hdaniel R 1:52 1
>>> asimov
>>> 312 asimov01 cpu-only hdaniel R 1:49 1
>>> asimov
>>>
>>> The same happens with GPU jobs. If I launch 5 jobs, requiring one GPU
>>> each, the 5th job will never run. The preemption is not working with the
>>> specified timeslice.
>>>
>>> I tried several combinations:
>>>
>>> SchedulerType=sched/builtin and backfill
>>> SelectType=select/cons_tres and linear
>>>
>>> I'll appreciate any help and suggestions
>>> The slurm.conf is below.
>>> Thanks
>>>
>>> ClusterName=asimov
>>> SlurmctldHost=localhost
>>> MpiDefault=none
>>> ProctrackType=proctrack/linuxproc # proctrack/cgroup
>>> ReturnToService=2
>>> SlurmctldPidFile=/var/run/slurmctld.pid
>>> SlurmctldPort=6817
>>> SlurmdPidFile=/var/run/slurmd.pid
>>> SlurmdPort=6818
>>> SlurmdSpoolDir=/var/lib/slurm/slurmd
>>> SlurmUser=slurm
>>> StateSaveLocation=/var/lib/slurm/slurmctld
>>> SwitchType=switch/none
>>> TaskPlugin=task/none # task/cgroup
>>> #
>>> # TIMERS
>>> InactiveLimit=0
>>> KillWait=30
>>> MinJobAge=300
>>> SlurmctldTimeout=120
>>> SlurmdTimeout=300
>>> Waittime=0
>>> #
>>> # SCHEDULING
>>> #FastSchedule=1 #obsolete
>>> SchedulerType=sched/builtin #backfill
>>> SelectType=select/cons_tres
>>> SelectTypeParameters=CR_Core #CR_Core_Memory let's only one job run
>>> at a time
>>> PreemptType = preempt/partition_prio
>>> PreemptMode = SUSPEND,GANG
>>> SchedulerTimeSlice=30 #in seconds, default 30
>>> #
>>> # LOGGING AND ACCOUNTING
>>> #AccountingStoragePort=
>>> AccountingStorageType=accounting_storage/none
>>> #AccountingStorageEnforce=associations
>>> #ClusterName=bip-cluster
>>> JobAcctGatherFrequency=30
>>> JobAcctGatherType=jobacct_gather/linux
>>> SlurmctldDebug=info
>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>>> SlurmdDebug=info
>>> SlurmdLogFile=/var/log/slurm/slurmd.log
>>> #
>>> #
>>> # COMPUTE NODES
>>> #NodeName=asimov CPUs=64 RealMemory=500 State=UNKNOWN
>>> #PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>>>
>>> # Partitions
>>> GresTypes=gpu
>>> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2
>>> State=UNKNOWN
>>> PartitionName=asimov01 Nodes=asimov Default=YES MaxTime=INFINITE
>>> MaxNodes=1 DefCpuPerGPU=2 State=UP
>>>
>>>
>
> --
> com os melhores cumprimentos,
>
> Helder Daniel
> Universidade do Algarve
> Faculdade de Ciências e Tecnologia
> Departamento de Engenharia Electrónica e Informática
> https://www.ualg.pt/pt/users/hdaniel
>
--
com os melhores cumprimentos,
Helder Daniel
Universidade do Algarve
Faculdade de Ciências e Tecnologia
Departamento de Engenharia Electrónica e Informática
https://www.ualg.pt/pt/users/hdaniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230113/b2bece46/attachment-0003.htm>
More information about the slurm-users
mailing list