[slurm-users] (no subject)
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Dec 9 07:41:54 UTC 2019
Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM
On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
> Hi Dean,
>
> You may want to look at the links in my Slurm Wiki page. Both the
> official Slurm documentation and other resources are listed. I think most
> of your requirements and questions are described in these pages.
>
> My Wiki gives detailed deployment information for a CentOS 7 cluster, but
> much of this information should be relevant for Ubuntu as well.
>
> /Ole
>
>
> On 06-12-2019 22:57, Dean Schulze wrote:
>> I'm doing my first slurm installation. The schedmd docs assume that I
>> have a cluster that meets certain (unstated) requirements available, but
>> I don't. I've found a couple of examples showing how to setup a cluster
>> for slurm using real hardware (nodes) with GPUs:
>>
>> https://github.com/mknoxnv/ubuntu-slurm
>> https://github.com/nateGeorge/slurm_gpu_ubuntu
>>
>> The requirements for a cluster for slurm seem to be:
>>
>> Passwordless SSH is working between slurm controller and slurm nodes
>> There is shared storage between all the nodes: /storage & /home (NFS)
>> The UIDs and GIDs will be consistent between all the nodes. (LDAP or
>> other)
>> Hostnames have to be a FQDN.
>> Slurm will be used to control SSH access to compute nodes.
>> Compute nodes are DNS resolvable.
>> Compute nodes have GPUs and the latest CUDA drivers installed
>> Time has to be synchronized across all nodes and controller (ntp or
>> freeipa)
>> (If time isn't synch'ed properly the controller might not start)
>>
>>
>> My questions are:
>>
>> Are the cluster requirements above correct and complete?
>>
>> Can I use virtual machines without GPUs for my nodes?
>> (This is just to get started. Eventually I'll have real hardware
>> with GPUs for my nodes.)
>>
>> From the Ubuntu link on your download page I've downloaded these files:
>>
>> slurmctld_18.08.6.2-1_amd64.deb 610.9 kB
>> slurm-client_18.08.6.2-1_amd64.deb 887.7 kB
>> slurm-wlm_18.08.6.2-1_amd64.deb 12.3 kB
>>
>> The slurmctld would be installed on my controller, but what do I
>> install on my nodes?
>> The slurm-wlm file is very small. Would I install it on my node?
>> What is the client for?
More information about the slurm-users
mailing list