[slurm-users] [External] incorrect number of cpu's being reported in srun job

Prentice Bisbal pbisbal at pppl.gov
Tue Jun 22 21:06:27 UTC 2021


Yes,

You need to use the cgroups plugin.


On Fri, Jun 18, 2021, 12:29 AM Sid Young <sid.young at gmail.com> wrote:

> G'Day all,
>
> I've had a question from a user of our new HPC, the following should
> explain it:
>
> ➜ srun -N 1 --cpus-per-task 8 --time 01:00:00 --mem 2g --pty python3
> Python 3.6.8 (default, Nov 16 2020, 16:55:22)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> os.cpu_count()
> 256
> >>> len(os.sched_getaffinity(0))
> 256
> >>>
>
> The output of os.cpu_count() is correct: there are 256 CPUs on the server,
> but the output of len(os.sched_getaffinity(0)) is still 256 when I was
> expecting it to be 8 - the number of CPUs this process is restricted to. Is
> my slurm command incorrect? When I run a similar test on XXXXXX I get the
> expected behaviour:
>
> ➜ qsub -I -l select=1:ncpus=4:mem=1gb
> qsub: waiting for job 9616042.pbs to start
> qsub: job 9616042.pbs ready
> ➜ python3
> Python 3.4.10 (default, Dec 13 2019, 16:20:47) [GCC] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> os.cpu_count()
> 72
> >>> len(os.sched_getaffinity(0))
> 4
> >>>
>
> This seems to be a problem for me as I have a program provided by a
> third-party company that keeps trying to run with 256 threads and crashes.
> The program is a compiled binary so I don't know if they're just grabbing
> the number of CPUs or correctly getting the scheduler affinity, but it
> seems as though TRI's HPC will return the total number of CPUs in any case.
> There aren't any options with the program to set the number of threads
> manually.
>
> My question to the group is what's causing this? Do I need a cgroups
> plugin?
>
> I think these are the relevant lines from the slurm.conf file:
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
> ReturnToService=1
> CpuFreqGovernors=OnDemand,Performance,UserSpace
> CpuFreqDef=Performance
>
>
>
>
> Sid Young
> Translational Research Institute
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210622/7ce0dda0/attachment.htm>


More information about the slurm-users mailing list