GPU GRES verification and some really broad questions. - slurm-users

3 May 2024


      Hi,
I am a complete slurm-admin and sys-admin noob trying to set up a 3 node
Slurm cluster. I have managed to get a minimum working example running, in
which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES.
This is *slurm.conf* without the comment lines:
root@server1:/etc/slurm# grep -v "#"
slurm.confClusterName=DlabClusterSlurmctldHost=server1GresTypes=gpuProctrackType=proctrack/linuxprocReturnToService=1SlurmctldPidFile=/var/run/slurmctld.pidSlurmctldPort=6817SlurmdPidFile=/var/run/slurmd.pidSlurmdPort=6818SlurmdSpoolDir=/var/spool/slurmdSlurmUser=rootStateSaveLocation=/var/spool/slurmctldTaskPlugin=task/affinity,task/cgroupInactiveLimit=0KillWait=30MinJobAge=300SlurmctldTimeout=120SlurmdTimeout=300Waittime=0SchedulerType=sched/backfillSelectType=select/cons_tresJobCompType=jobcomp/noneJobAcctGatherFrequency=30SlurmctldDebug=infoSlurmctldLogFile=/var/log/slurmctld.logSlurmdDebug=debug3SlurmdLogFile=/var/log/slurmd.logNodeName=server[1-3]
RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2
State=UNKNOWN Gres=gpu:1PartitionName=mainPartition Nodes=ALL
Default=YES MaxTime=INFINITE State=UP
This is *gres.conf* (only one line), each node has been assigned its
corresponding NodeName:
root@server1:/etc/slurm# cat gres.confNodeName=server1 Name=gpu
File=/dev/nvidia0
Those are the only config files I have.
I have a few general questions, loosely arranged in ascending order of
generality:
1) I have enabled the allocation of GPU resources as a GRES and have tested
this by running:
shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname2:
server30: server11: server2
Is this a good way to check if the configs have worked correctly? How else
can I easily check if the GPU GRES has been properly configured?
2) I want to reserve a few CPU cores, and a few gigs of memory for use by
non slurm related tasks. According to the documentation, I am to use
CoreSpecCount https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount and
MemSpecLimit https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit to
achieve this. The documentation for CoreSpecCount says "the Slurm daemon
slurmd may either be confined to these resources (the default) or prevented
from using these resources", how do I change this default behaviour to have
the config specify the cores reserved for non slurm stuff instead of
specifying how many cores slurm can use?
3) While looking up examples online on how to run Python scripts inside a
conda env, I have seen that the line 'module load conda' should be run
before running 'conda activate myEnv' in the sbatch submission script. The
command 'module' did not exist until I installed the apt package
'environment-modules', but now I see that conda is not listed as a module
that can be loaded when I check using the command 'module avail'. How do I
fix this?
4) A very broad question: while managing the resources being used by a
program, slurm might happen to split the resources across multiple
computers that might not necessarily have the files required by this
program to run. For example, a python script that requires the package
'numpy' to function but that package was not installed on all of the
computers. How are such things dealt with? Is the module approach meant to
fix this problem? In my previous question, if I had a python script that
users usually run just by running a command like 'python3 someScript.py'
instead of running it within a conda environment, how should I enable slurm
to manage the resources required by this script? Would I have to install
all the packages required by this script on all the computers that are in
the cluster?
5) Related to the previous question: I have set up my 3 nodes in such a way
that all the users' home directories are stored on a ceph cluster
https://en.wikipedia.org/wiki/Ceph_(software) created using the hard
drives from all the 3 nodes, which essentially means that a user's home
directory is mounted at the same location on all 3 computers - making a
user's data visible to all 3 nodes. Does this make the process of managing
the dependencies of a program as described in the previous question easier?
I realise that programs having to read and write to files on the hard
drives of a ceph cluster is not really the fastest so I am planning on
having users use the /tmp/ directory for speed critical reading and
writing, as the OSs have been installed on NVME drives.