Hi,
I am a complete slurm-admin and sys-admin noob trying to set up a 3 node Slurm cluster. I have managed to get a minimum working example running, in which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES.
This is *slurm.conf* without the comment lines:
root@server1:/etc/slurm# grep -v "#" slurm.confClusterName=DlabClusterSlurmctldHost=server1GresTypes=gpuProctrackType=proctrack/linuxprocReturnToService=1SlurmctldPidFile=/var/run/slurmctld.pidSlurmctldPort=6817SlurmdPidFile=/var/run/slurmd.pidSlurmdPort=6818SlurmdSpoolDir=/var/spool/slurmdSlurmUser=rootStateSaveLocation=/var/spool/slurmctldTaskPlugin=task/affinity,task/cgroupInactiveLimit=0KillWait=30MinJobAge=300SlurmctldTimeout=120SlurmdTimeout=300Waittime=0SchedulerType=sched/backfillSelectType=select/cons_tresJobCompType=jobcomp/noneJobAcctGatherFrequency=30SlurmctldDebug=infoSlurmctldLogFile=/var/log/slurmctld.logSlurmdDebug=debug3SlurmdLogFile=/var/log/slurmd.logNodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
This is *gres.conf* (only one line), each node has been assigned its corresponding NodeName:
root@server1:/etc/slurm# cat gres.confNodeName=server1 Name=gpu File=/dev/nvidia0
Those are the only config files I have.
I have a few general questions, loosely arranged in ascending order of generality:
1) I have enabled the allocation of GPU resources as a GRES and have tested this by running:
shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname2: server30: server11: server2
Is this a good way to check if the configs have worked correctly? How else can I easily check if the GPU GRES has been properly configured?
2) I want to reserve a few CPU cores, and a few gigs of memory for use by non slurm related tasks. According to the documentation, I am to use CoreSpecCount https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount and MemSpecLimit https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit to achieve this. The documentation for CoreSpecCount says "the Slurm daemon slurmd may either be confined to these resources (the default) or prevented from using these resources", how do I change this default behaviour to have the config specify the cores reserved for non slurm stuff instead of specifying how many cores slurm can use?
3) While looking up examples online on how to run Python scripts inside a conda env, I have seen that the line 'module load conda' should be run before running 'conda activate myEnv' in the sbatch submission script. The command 'module' did not exist until I installed the apt package 'environment-modules', but now I see that conda is not listed as a module that can be loaded when I check using the command 'module avail'. How do I fix this?
4) A very broad question: while managing the resources being used by a program, slurm might happen to split the resources across multiple computers that might not necessarily have the files required by this program to run. For example, a python script that requires the package 'numpy' to function but that package was not installed on all of the computers. How are such things dealt with? Is the module approach meant to fix this problem? In my previous question, if I had a python script that users usually run just by running a command like 'python3 someScript.py' instead of running it within a conda environment, how should I enable slurm to manage the resources required by this script? Would I have to install all the packages required by this script on all the computers that are in the cluster?
5) Related to the previous question: I have set up my 3 nodes in such a way that all the users' home directories are stored on a ceph cluster https://en.wikipedia.org/wiki/Ceph_(software) created using the hard drives from all the 3 nodes, which essentially means that a user's home directory is mounted at the same location on all 3 computers - making a user's data visible to all 3 nodes. Does this make the process of managing the dependencies of a program as described in the previous question easier? I realise that programs having to read and write to files on the hard drives of a ceph cluster is not really the fastest so I am planning on having users use the /tmp/ directory for speed critical reading and writing, as the OSs have been installed on NVME drives.