On a single Rocky8 workstation with one GPU where we wanted ssh interactive logins to it to have a small portion of its resources (shell, compiling, simple data manipulations, console desktop, etc) and the rest for SLURM we did this: - Set it to use cgroupv2 * modify /etc/defaultg/grub to add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX. Remake grub with grub2-mkconfig * create file /usr/etc/cgroup_cpuset_init with the lines #!/bin/bash echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control echo "+cpuset" >> /sys/fs/cgroup/system.slice/cgroup.subtree_control * Modify/create /etc/systemd/system/slurmd.service.d/override.conf so it has: [Service] ExecStartPre=-/usr/etc/cgroup_cpuset_init - figure out exact cores to use for "free user" use and cores for SLURM. Also use GPU sharding in SLURM so GPU can be shared. * install hwloc-ls * run 'hwloc-ls' to tranlate physical cores 0-9 to logical cores For me P 0-9 was Logical 0,2,4,6,8,10,12,14,16,18 * in /etc/slurm.conf the NodeName definition has CPUs=128 Boards=1 SocketsPerBoard=1 CoresPerSocket=64 ThreadsPerCore=2 \ RealMemory=257267 MemSpecLimit=20480 \ CpuSpecList=0,2,4,6,8,10,12,14,16,18 \ TmpDisk=6000000 Gres=gpu:nvidia_a2:1,shard:nvidia_a2:32 reserving those 10 cores and 20GB of RAM for "free user" * gres.conf has the lines: AutoDetect=nvml Name=shard Count=32 * Need to add gres/shard to GresTypes= too. Job submissions use the option --gres=shard:N where N is less than 32 - Set up systemd to restrict "free users" to cores 0-9 and the 20GB * Run: systemctl set-property user.slice MemoryHigh=20480M * Run for every individual user on the system systemctl set-property user-$uid.slice AllowedCPUs=0-9 where $uid is that users user ID. We do this in a script that also runs sacctmgr to add them to the SLURM system I could not just set this one for user.slice itself which is what I first tried because it then restricted the root user too and that cause wierd behavior with a lot of system tools. So far the root/daemon process work fine in the 20GB limit though so that MemoryHigh=20480M is one and done Then reboot. -- Paul Raines (http://help.nmr.mgh.harvard.edu) The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.