[slurm-users] NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ???

Xiang Gao qasdfgtyuiop at gmail.com
Sat Feb 9 20:26:01 UTC 2019


Hi Jeffrey and Antony,

Thanks a lot for your valuable help and all the infos. I just tested on my
PC according to your instruction while waiting for running jobs on the
server to finish. It works perfectly.
I tested by setting `SelectTypeParameters=CR_CPU` and configuring `CPUS=`
without specifying ` CoresPerSocket=` and `ThreadsPerCore=`. This do gives
the expected behavior I am looking for.


Hi Cyrus,

Although have not tested on the server yet, I guess the solution above
should be working correctly. Thanks!

The gres on the server is:
Name=gpu Type=gtx1080ti File=/dev/nvidia0
Name=gpu Type=gtx1080ti File=/dev/nvidia1
Name=gpu Type=titanv File=/dev/nvidia2
Name=gpu Type=titanv File=/dev/nvidia3
Name=gpu Type=titanv File=/dev/nvidia4
Name=gpu Type=v100 File=/dev/nvidia5
Name=gpu Type=gp100 File=/dev/nvidia6
Name=gpu Type=gp100 File=/dev/nvidia7

The submission line is:
#!/bin/bash
#SBATCH --job-name=US_Y285_TTP_GDP
#SBATCH --output=test_%j.out
#SBATCH --error=test_%j.err
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=600:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --gres=gpu:1

These just looks normal to me.

Xiang Gao


Cyrus Proctor <cproctor at tacc.utexas.edu> 于2019年2月8日周五 下午12:40写道:

> Xiang,
>
> From what I've of the original question, gres.conf may be another place to
> verify the setup that only one core is being allocated per gpu request:
> https://slurm.schedmd.com/gres.conf.html
>
> Seeing the run submission line and gres.conf might help others give you
> further advise.
>
> To Jeffrey's email: the concept of oversubscription may be beneficial
> versus changing resource inventories:
> https://slurm.schedmd.com/cons_res_share.html
>
> Best,
>
> Cyrus
> On 2/8/19 9:44 AM, Jeffrey Frey wrote:
>
> Documentation for CR_CPU:
>
>
> CR_CPU
> CPUs are consumable resources. Configure the number of CPUs on each node,
> which may be equal to the count of cores or hyper-threads on the node
> depending upon the desired minimum resource allocation. The
> node's Boards, Sockets, CoresPerSocket andThreadsPerCore may optionally be
> configured and result in job allocations which have improved locality; *however
> doing so will prevent more than one job being from being allocated on each
> core.*
>
>
>
> So once you're configured node(s) with ThreadsPerCore=N, the cons_res
> plugin still forces tasks to span all threads on a core.  Elsewhere in the
> documentation it is stated:
>
>
> *Note that the Slurm can allocate resources to jobs down to the resolution
> of a core.*
>
>
>
> So you MUST treat a thread as a core if you want to schedule individual
> threads.  I can confirm this using the config:
>
>
> SelectTypeParameters = CR_CPU_MEMORY
> NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2
>
>
>
> Submitting a 1-cpu job, if I check the cpuset assigned to a job on n003:
>
>
> $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
> 4,12
>
>
>
> If I instead configure as:
>
>
> SelectTypeParameters = CR_Core_Memory
> NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1
>
>
>
> Slurm will schedule "cores" 0-15 to jobs, which the cpuset cgroup happily
> accepts.  A 1-cpu job then shows:
>
>
> $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
> 2
>
>
>
> and a 2-cpu job shows:
>
>
> $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
> 4,12
>
>
>
>
>
>
>
> On Feb 8, 2019, at 5:09 AM, Antony Cleave <antony.cleave at gmail.com> wrote:
>
> if you want slurm to just ignore the difference between physical and
> logical cores then you can change
> SelectTypeParameters=CR_Core
> to
> SelectTypeParameters=CR_CPU
>
> and then it will treat threads as CPUs and then it will let you start the
> number of tasks you expect
>
> Antony
>
> On Thu, 7 Feb 2019 at 18:04, Jeffrey Frey <frey at udel.edu> wrote:
> Your nodes are hyperthreaded (ThreadsPerCore=2).  Slurm always allocates
> _all threads_ associated with a selected core to jobs.  So you're being
> assigned both threads on core N.
>
>
> On our development-partition nodes we configure the threads as cores, e.g.
>
>
> NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8
> ThreadsPerCore=1
>
>
> to force Slurm to schedule the threads separately.
>
>
>
> On Feb 7, 2019, at 12:10 PM, Xiang Gao <qasdfgtyuiop at gmail.com>
> <qasdfgtyuiop at gmail.com> wrote:
>
> Hi All,
>
> We configured slurm on a server with 8 GPU and 16 CPUs and want to use
> slurm to scheduler for both CPU and GPU jobs. We observed an unexpected
> behavior that, although there are 16 CPUs, slurm only schedule 8 jobs to
> run even if there are jobs not asking any GPU. If I inspect detailed
> information using `scontrol show job`, I see some strange thing on some job
> that just ask for 1 CPU:
>
> NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1
>
> If I understand these concepts correctly, as the number of nodes is 1,
> number of tasks is 1, and number of cpus/task is 1, in principle there is
> no way that the final number of CPUs is 2. I'm not sure if I misunderstand
> the concepts, configure slurm wrongly, or this is a bug. So I come for help.
>
> Some related config are:
>
> # COMPUTE NODES
> NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=4
> ThreadsPerCore=2 RealMemory=120000
> Gres=gpu:gtx1080ti:2,gpu:titanv:3,gpu:v100:1,gpu:gp100:2
> State=UNKNOWN
> PartitionName=queue Nodes=moria Default=YES MaxTime=INFINITE State=UP
>
> # SCHEDULING
> FastSchedule=1
> SchedulerType=sched/backfill
> GresTypes=gpu
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
>
> Best,
> Xiang Gao
>
>
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE  19716
> Office: (302) 831-6034  Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
>
>
>
>
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE  19716
> Office: (302) 831-6034  Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190209/91ea1850/attachment.html>


More information about the slurm-users mailing list