Hi, I am running Slurm (v22.05.8) on 3 nodes each with the following specs:OS: Proxmox VE 8.1.4 x86_64 (based on Debian 12)
CPU: AMD EPYC 7662 (128)
GPU: NVIDIA GeForce RTX 4070 Ti
Memory: 128 Gb
This is /etc/slurm/slurm.conf on all 3 computers without the comment lines:
ClusterName=DlabCluster
SlurmctldHost=server1
GresTypes=gpu
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/affinity,task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=debug3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=debug3
SlurmdLogFile=/var/log/slurmd.log
NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
I want to reserve a few cores and a few gigs of RAM for use only by the OS which cannot be accessed by jobs being managed by Slurm. What configuration do I need to do to achieve this?
Is it possible to reserve in a similar fashion a 'percent' of the GPU which Slurm cannot exceed so that the OS has some GPU resources?
Is it possible to have these configs be different for each of the 3 nodes?
Thanks!