[slurm-users] Having a possible cgroup issue?
Michael Robbert
mrobbert at mines.edu
Thu Dec 6 10:34:33 MST 2018
Wes,
You didn't list the Slurm command that you used to get your interactive session. In particular did you ask Slurm for access to all 14 cores?
Also note that since Matlab is using threads to distribute work among cores you don't want to ask for multiple tasks (-n or --ntasks) as that will give every process a single core. You want to allocate multiple CPUs per process with -c or --cpus-per-task to tell Slurm how many CPUs your Matlab process should have access to.
Mike
On 12/6/18 10:05 AM, Anderson, Wes R wrote:
PERSONAL/NONWORK // EXTERNAL
I took a look through the archives, and I did not see an clear answer to the issue I was seeing, so I thought I would go ahead and ask.
I am having a cluster issue with SLURM and I hoped you might be able to help me out. I built a small test cluster to determine if it might meet some compute needs I have but seem to keep running into an issue where SLURM is restricting MATLAB to using a single CPU regardless of how many we request.
During testing I found the following:
When I login into a MATLAB interactive session and run “feature numcores”
I get the following:
[cid:image002.jpg at 01D48D35.180C4F30]
Which is correct, as I have 14 cores and they are all available.
However when I go into SLURM and request a MATLAB interactive session and run the same command on the same computer:
[cid:image004.png at 01D48D35.180C4F30]
So, what I understand is that my cgroups settings in SLURM are restricting MATLAB to a single core. Is that correct? Also, how do I fix this?
Here is my cgroups.conf
###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
#--
########################################################
# W A R N I N G: This file is managed by Puppet #
# - - - - - - - changes are likely to be overwritten #
########################################################
#######################
CgroupAutomount=yes
#######################
# testing -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#ConstrainCores=no
#ConstrainRAMSpace=no
#ConstrainSwapSpace=no
# testing -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
#
ConstrainDevices=no
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
##MinRAMSpace=30
# TASK/CGROUP PLUGIN
# Constrain the job cgroup RAM to this percentage of the allocated memory.
#AllowedRAMSpace=10
AllowedRamSpace=100
# TaskAffinity=<yes|no>
# If configured to "yes" then set a default task affinity to bind each
# step task to a subset of the allocated cores using
# sched_setaffinity. The default value is "no". Note: This feature
# requires the Portable Hardware Locality (hwloc) library to be
# installed.
TaskAffinity=yes
# MemorySwappiness=<number>
# Configure the kernel's priority for swapping out anonymous pages (such as program data)
# verses file cache pages for the job cgroup. Valid values are between 0 and 100, inclusive. A
# value of 0 prevents the kernel from swapping out program data. A value of 100 gives equal
# priorioty to swapping out file cache or anonymous pages. If not set, then the kernel's default
# swappiness value will be used. Either ConstrainRAMSpace or ConstrainSwapSpace must
# be set to yes in order for this parameter to be applied.
MemorySwappiness=0
#####################################################################################
# If compute nodes mount Lustre or NFS file systems, it may be a good idea to #
# configure cgroup.conf with: #
# ConstrainKmemSpace=no #
# #
# From <https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#activating-cgroups><https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#activating-cgroups> #
#####################################################################################
ConstrainKmemSpace=no #
########################################################
# W A R N I N G: This file is managed by Puppet #
# - - - - - - - changes are likely to be overwritten #
########################################################
Thanks,
Wes
(A slurm neophyte)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181206/55796aa4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 13392 bytes
Desc: image006.jpg
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181206/55796aa4/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 15203 bytes
Desc: image007.png
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181206/55796aa4/attachment-0001.png>
More information about the slurm-users
mailing list