[slurm-users] (no subject)

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Sun Dec 8 20:18:42 UTC 2019


Hi Dean,

You may want to look at the links in my Slurm Wiki page.  Both the 
official Slurm documentation and other resources are listed.  I think 
most of your requirements and questions are described in these pages.

My Wiki gives detailed deployment information for a CentOS 7 cluster, 
but much of this information should be relevant for Ubuntu as well.

/Ole


On 06-12-2019 22:57, Dean Schulze wrote:
> I'm doing my first slurm installation.  The schedmd docs assume that I 
> have a cluster that meets certain (unstated) requirements available, but 
> I don't.  I've found a couple of examples showing how to setup a cluster 
> for slurm using real hardware (nodes) with GPUs:
> 
> https://github.com/mknoxnv/ubuntu-slurm
> https://github.com/nateGeorge/slurm_gpu_ubuntu
> 
> The requirements for a cluster for slurm seem to be:
> 
>    Passwordless SSH is working between slurm controller and slurm nodes
>    There is shared storage between all the nodes: /storage & /home (NFS)
>    The UIDs and GIDs will be consistent between all the nodes. (LDAP or 
> other)
>    Hostnames have to be a FQDN.
>    Slurm will be used to control SSH access to compute nodes.
>    Compute nodes are DNS resolvable.
>    Compute nodes have GPUs and the latest CUDA drivers installed
>    Time has to be synchronized across all nodes and controller (ntp or 
> freeipa)
>    (If time isn't synch'ed properly the controller might not start)
> 
> 
> My questions are:
> 
>    Are the cluster requirements above correct and complete?
> 
>    Can I use virtual machines without GPUs for my nodes?
>    (This is just to get started.  Eventually I'll have real hardware 
> with GPUs for my nodes.)
> 
>    From the Ubuntu link on your download page I've downloaded these files:
> 
>      slurmctld_18.08.6.2-1_amd64.deb      610.9 kB
>      slurm-client_18.08.6.2-1_amd64.deb   887.7 kB
>      slurm-wlm_18.08.6.2-1_amd64.deb      12.3 kB
> 
>    The slurmctld would be installed on my controller, but what do I 
> install on my nodes?
>    The slurm-wlm file is very small.  Would I install it on my node?  
> What is the client for?



More information about the slurm-users mailing list