[slurm-users] (no subject)

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Dec 9 07:41:54 UTC 2019


Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM

On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
> Hi Dean,
> 
> You may want to look at the links in my Slurm Wiki page.  Both the 
> official Slurm documentation and other resources are listed.  I think most 
> of your requirements and questions are described in these pages.
> 
> My Wiki gives detailed deployment information for a CentOS 7 cluster, but 
> much of this information should be relevant for Ubuntu as well.
> 
> /Ole
> 
> 
> On 06-12-2019 22:57, Dean Schulze wrote:
>> I'm doing my first slurm installation.  The schedmd docs assume that I 
>> have a cluster that meets certain (unstated) requirements available, but 
>> I don't.  I've found a couple of examples showing how to setup a cluster 
>> for slurm using real hardware (nodes) with GPUs:
>>
>> https://github.com/mknoxnv/ubuntu-slurm
>> https://github.com/nateGeorge/slurm_gpu_ubuntu
>>
>> The requirements for a cluster for slurm seem to be:
>>
>>    Passwordless SSH is working between slurm controller and slurm nodes
>>    There is shared storage between all the nodes: /storage & /home (NFS)
>>    The UIDs and GIDs will be consistent between all the nodes. (LDAP or 
>> other)
>>    Hostnames have to be a FQDN.
>>    Slurm will be used to control SSH access to compute nodes.
>>    Compute nodes are DNS resolvable.
>>    Compute nodes have GPUs and the latest CUDA drivers installed
>>    Time has to be synchronized across all nodes and controller (ntp or 
>> freeipa)
>>    (If time isn't synch'ed properly the controller might not start)
>>
>>
>> My questions are:
>>
>>    Are the cluster requirements above correct and complete?
>>
>>    Can I use virtual machines without GPUs for my nodes?
>>    (This is just to get started.  Eventually I'll have real hardware 
>> with GPUs for my nodes.)
>>
>>    From the Ubuntu link on your download page I've downloaded these files:
>>
>>      slurmctld_18.08.6.2-1_amd64.deb      610.9 kB
>>      slurm-client_18.08.6.2-1_amd64.deb   887.7 kB
>>      slurm-wlm_18.08.6.2-1_amd64.deb      12.3 kB
>>
>>    The slurmctld would be installed on my controller, but what do I 
>> install on my nodes?
>>    The slurm-wlm file is very small.  Would I install it on my node? 
>> What is the client for?




More information about the slurm-users mailing list