[slurm-users] Fwd: Slurm/cgroups on a single head/compute node

Alex Chekholko alex at calicolabs.com
Wed Aug 21 16:44:00 UTC 2019


Hey David,

Which distro?  Which kernel version?  Which systemd version? Which SLURM
version?

Based on some paths in your varialbles, I'm guessing Ubuntu distro with
Debian SLURM packages?

Regards,
Alex

On Wed, Aug 21, 2019 at 5:24 AM David da Silva Pires <
david.pires at butantan.gov.br> wrote:

> Hi supers.
>
> I am configuring a server with slurm/cgroups. This server will be the
> unique slurm node, so it is the head and the compute node at the same time.
> In order to force users to submit slurm jobs instead of running the
> processes directly on the server, I would like to use cgroups to isolate
> the last 8 CPUs in a cpuset for the users (acting as if they were the head
> node). The other CPUs can be used by any slurm job.
>
> I followed instructions from many sites on internet, but the final
> configuration still do not do what I want. The processes started by normal
> users in fact are allocated at the last 8 CPUs, but also the slurm jobs
> that are submitted.
>
> It seems that since jobs belong to normal users (not slurm user), they are
> also limited by cgroups.
>
> Is it possible to achieve what I want?
>
> Here are my configuration files:
>
> ========================================================================
> /etc/cgrules.conf
> ========================================================================
> # <user> <controllers> <destination>
> root  cpu,cpuset,memory /
> slurm cpu,cpuset,memory /
> * cpu,cpuset,memory interactive
>
>
> ========================================================================
> /etc/cgconfig.conf
> ========================================================================
> group interactive {
>   cpu {
>      cpu.shares = 100;
>   }
>   cpuset {
>      cpuset.cpus = 216-223;
>      cpuset.cpu_exclusive = 1;
>      cpuset.mem_exclusive = 1;
>      cpuset.mem_hardwall = 1;
>      cpuset.memory_migrate = 0;
>      cpuset.memory_spread_page = 0;
>      cpuset.memory_spread_slab = 0;
>      cpuset.mems = 0;
>      cpuset.sched_load_balance = 0;
>      cpuset.sched_relax_domain_level = -1;
>   }
>   memory {
>      memory.limit_in_bytes = 8G;
>      memory.swappiness = 41;
>      memory.memsw.limit_in_bytes = 8G;
>   }
> }
>
>
> ========================================================================
> slurm.conf
> ========================================================================
> ControlMachine=vital
> ControlAddr=172.25.2.25
> AuthType=auth/munge
> CryptoType=crypto/munge
> GresTypes=gpu
> MaxTasksPerNode=216
> MpiDefault=none
> ProctrackType=proctrack/cgroup
> ReturnToService=1
> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/spool/slurmd
> SlurmUser=slurm
> StateSaveLocation=/var/spool/slurm-llnl
> SwitchType=switch/none
> TaskPlugin=task/cgroup
> TaskPluginParam=sched
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> DefMemPerNode=998749
> FastSchedule=1
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
> AccountingStorageHost=vital
> AccountingStorageLoc=slurm_acct_db
> AccountingStoragePass=/var/run/munge/munge.socket.2
> AccountingStoragePort=6819
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageUser=slurm
> AccountingStoreJobComment=YES
> ClusterName=bioinfo
> JobCompHost=vital
> JobCompLoc=slurm_acct_db
> JobCompPass=aikeeCu4S
> JobCompPort=6819
> JobCompType=jobcomp/slurmdbd
> JobCompUser=slurm
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/cgroup
> SlurmctldDebug=verbose
> SlurmdDebug=verbose
> BurstBufferType=burst_buffer/generic
> NodeName=vital NodeAddr=172.25.2.25 CPUs=224 RealMemory=1031517 Sockets=4
> CoresPerSocket=28 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
> MemSpecLimit=32768
> PartitionName=batch Nodes=vital OverSubscribe=YES Default=YES
> MaxTime=INFINITE State=UP
>
>
> ========================================================================
> cgroup.conf
> ========================================================================
> CgroupMountpoint="/sys/fs/cgroup"
> CgroupAutomount=yes
> AllowedRAMSpace=100
> AllowedSwapSpace=0
> ConstrainCores=no
> ConstrainDevices=yes
> ConstrainKmemSpace=no
> ConstrainRAMSpace=no
> ConstrainSwapSpace=no
> MaxRAMPercent=100
> MaxSwapPercent=100
> TaskAffinity=no
>
>
> Thanks in advance for any help.
>
> --
> David da Silva Pires
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190821/686c5efa/attachment-0003.htm>


More information about the slurm-users mailing list