Hi,
I am a complete slurm-admin and sys-admin noob trying to set up a 3 node Slurm cluster. I have managed to get a minimum working example running, in which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES.
This is *slurm.conf* without the comment lines:
root@server1:/etc/slurm# grep -v "#" slurm.confClusterName=DlabClusterSlurmctldHost=server1GresTypes=gpuProctrackType=proctrack/linuxprocReturnToService=1SlurmctldPidFile=/var/run/slurmctld.pidSlurmctldPort=6817SlurmdPidFile=/var/run/slurmd.pidSlurmdPort=6818SlurmdSpoolDir=/var/spool/slurmdSlurmUser=rootStateSaveLocation=/var/spool/slurmctldTaskPlugin=task/affinity,task/cgroupInactiveLimit=0KillWait=30MinJobAge=300SlurmctldTimeout=120SlurmdTimeout=300Waittime=0SchedulerType=sched/backfillSelectType=select/cons_tresJobCompType=jobcomp/noneJobAcctGatherFrequency=30SlurmctldDebug=infoSlurmctldLogFile=/var/log/slurmctld.logSlurmdDebug=debug3SlurmdLogFile=/var/log/slurmd.logNodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
This is *gres.conf* (only one line), each node has been assigned its corresponding NodeName:
root@server1:/etc/slurm# cat gres.confNodeName=server1 Name=gpu File=/dev/nvidia0
Those are the only config files I have.
I have a few general questions, loosely arranged in ascending order of generality:
1) I have enabled the allocation of GPU resources as a GRES and have tested this by running:
shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname2: server30: server11: server2
Is this a good way to check if the configs have worked correctly? How else can I easily check if the GPU GRES has been properly configured?
2) I want to reserve a few CPU cores, and a few gigs of memory for use by non slurm related tasks. According to the documentation, I am to use CoreSpecCount https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount and MemSpecLimit https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit to achieve this. The documentation for CoreSpecCount says "the Slurm daemon slurmd may either be confined to these resources (the default) or prevented from using these resources", how do I change this default behaviour to have the config specify the cores reserved for non slurm stuff instead of specifying how many cores slurm can use?
3) While looking up examples online on how to run Python scripts inside a conda env, I have seen that the line 'module load conda' should be run before running 'conda activate myEnv' in the sbatch submission script. The command 'module' did not exist until I installed the apt package 'environment-modules', but now I see that conda is not listed as a module that can be loaded when I check using the command 'module avail'. How do I fix this?
4) A very broad question: while managing the resources being used by a program, slurm might happen to split the resources across multiple computers that might not necessarily have the files required by this program to run. For example, a python script that requires the package 'numpy' to function but that package was not installed on all of the computers. How are such things dealt with? Is the module approach meant to fix this problem? In my previous question, if I had a python script that users usually run just by running a command like 'python3 someScript.py' instead of running it within a conda environment, how should I enable slurm to manage the resources required by this script? Would I have to install all the packages required by this script on all the computers that are in the cluster?
5) Related to the previous question: I have set up my 3 nodes in such a way that all the users' home directories are stored on a ceph cluster https://en.wikipedia.org/wiki/Ceph_(software) created using the hard drives from all the 3 nodes, which essentially means that a user's home directory is mounted at the same location on all 3 computers - making a user's data visible to all 3 nodes. Does this make the process of managing the dependencies of a program as described in the previous question easier? I realise that programs having to read and write to files on the hard drives of a ceph cluster is not really the fastest so I am planning on having users use the /tmp/ directory for speed critical reading and writing, as the OSs have been installed on NVME drives.
Hi,
Shooktija S N via slurm-users slurm-users@lists.schedmd.com writes:
Hi,
I am a complete slurm-admin and sys-admin noob trying to set up a 3 node Slurm cluster. I have managed to get a minimum working example running, in which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES.
This is slurm.conf without the comment lines: root@server1:/etc/slurm# grep -v "#" slurm.conf ClusterName=DlabCluster SlurmctldHost=server1 GresTypes=gpu ProctrackType=proctrack/linuxproc ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=root StateSaveLocation=/var/spool/slurmctld TaskPlugin=task/affinity,task/cgroup InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 SchedulerType=sched/backfill SelectType=select/cons_tres JobCompType=jobcomp/none JobAcctGatherFrequency=30 SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=debug3 SlurmdLogFile=/var/log/slurmd.log NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1 PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP This is gres.conf (only one line), each node has been assigned its corresponding NodeName: root@server1:/etc/slurm# cat gres.conf NodeName=server1 Name=gpu File=/dev/nvidia0 Those are the only config files I have.
I have a few general questions, loosely arranged in ascending order of generality:
- I have enabled the allocation of GPU resources as a GRES and have tested this by running:
shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname 2: server3 0: server1 1: server2 Is this a good way to check if the configs have worked correctly? How else can I easily check if the GPU GRES has been properly configured?
What do you mean by 'properly configured'? Ultimately you will want to submit a job to the nodes and use something like 'nvidia-smi' to see whether the GPUs are actually being used.
- I want to reserve a few CPU cores, and a few gigs of memory for use by non slurm related tasks. According to the documentation, I am to use
CoreSpecCount and MemSpecLimit to achieve this. The documentation for CoreSpecCount says "the Slurm daemon slurmd may either be confined to these resources (the default) or prevented from using these resources", how do I change this default behaviour to have the config specify the cores reserved for non slurm stuff instead of specifying how many cores slurm can use?
I am not aware that this is possible.
- While looking up examples online on how to run Python scripts inside a conda env, I have seen that the line 'module load conda' should be run before
running 'conda activate myEnv' in the sbatch submission script. The command 'module' did not exist until I installed the apt package 'environment-modules', but now I see that conda is not listed as a module that can be loaded when I check using the command 'module avail'. How do I fix this?
Environment modules and Conda are somewhat orthogonal to each other.
Environment modules is a mechanism for manipulating environment variables such as PATH and LD_LIBRARY_PATH. It allows you to provide easy access for all users to software which has been centrally installed in non-standard paths. It is not used to provide access to software installed via 'apt'.
Conda is another approach to providing non-standard software, but is usually used by individual users to install programs in their own home directories.
You can use environment modules to allow access to a different version of Conda than the one you get via 'apt', but there is no necessity to do that.
- A very broad question: while managing the resources being used by a program, slurm might happen to split the resources across multiple computers that
might not necessarily have the files required by this program to run. For example, a python script that requires the package 'numpy' to function but that package was not installed on all of the computers. How are such things dealt with? Is the module approach meant to fix this problem? In my previous question, if I had a python script that users usually run just by running a command like 'python3 someScript.py' instead of running it within a conda environment, how should I enable slurm to manage the resources required by this script? Would I have to install all the packages required by this script on all the computers that are in the cluster?
In general a distributed or cluster file system, such as NFS, Ceph or Lustre is used to provide access to multiple nodes. /home would be on such a files system, as would a large part of the software. You can use something like EasyBuild which will install software and generate the relevant module files.
- Related to the previous question: I have set up my 3 nodes in such a way that all the users' home directories are stored on a ceph cluster created using the
hard drives from all the 3 nodes, which essentially means that a user's home directory is mounted at the same location on all 3 computers - making a user's data visible to all 3 nodes. Does this make the process of managing the dependencies of a program as described in the previous question easier? I realise that programs having to read and write to files on the hard drives of a ceph cluster is not really the fastest so I am planning on having users use the /tmp/ directory for speed critical reading and writing, as the OSs have been installed on NVME drives.
Depending on the IO patterns created by a piece of software using the distributed file system might be fine or a local disk might be needed. Note that you might experience problems with /tmp filling up, so it may be better to have a separate /localscratch. In general you probably also want people to use as much RAM as possible in order to avoid filesystem IO altogether if this is feasible.
HTH
Loris