Hi, I am running Slurm (v22.05.8) on 3 nodes each with the following specs: OS: Proxmox VE 8.1.4 x86_64 (based on Debian 12) CPU: AMD EPYC 7662 (128) GPU: NVIDIA GeForce RTX 4070 Ti Memory: 128 Gb
This is /etc/slurm/slurm.conf on all 3 computers without the comment lines: ClusterName=DlabCluster SlurmctldHost=server1 GresTypes=gpu ProctrackType=proctrack/linuxproc ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=root StateSaveLocation=/var/spool/slurmctld TaskPlugin=task/affinity,task/cgroup InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 SchedulerType=sched/backfill SelectType=select/cons_tres JobCompType=jobcomp/none JobAcctGatherFrequency=30 SlurmctldDebug=debug3 SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=debug3 SlurmdLogFile=/var/log/slurmd.log NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1 PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
I want to reserve a few cores and a few gigs of RAM for use only by the OS which cannot be accessed by jobs being managed by Slurm. What configuration do I need to do to achieve this?
Is it possible to reserve in a similar fashion a 'percent' of the GPU which Slurm cannot exceed so that the OS has some GPU resources?
Is it possible to have these configs be different for each of the 3 nodes?
Thanks!