[slurm-users] (no subject)
Dean Schulze
dean.w.schulze at gmail.com
Fri Dec 6 21:57:59 UTC 2019
I'm doing my first slurm installation. The schedmd docs assume that I have
a cluster that meets certain (unstated) requirements available, but I
don't. I've found a couple of examples showing how to setup a cluster for
slurm using real hardware (nodes) with GPUs:
https://github.com/mknoxnv/ubuntu-slurm
https://github.com/nateGeorge/slurm_gpu_ubuntu
The requirements for a cluster for slurm seem to be:
Passwordless SSH is working between slurm controller and slurm nodes
There is shared storage between all the nodes: /storage & /home (NFS)
The UIDs and GIDs will be consistent between all the nodes. (LDAP or
other)
Hostnames have to be a FQDN.
Slurm will be used to control SSH access to compute nodes.
Compute nodes are DNS resolvable.
Compute nodes have GPUs and the latest CUDA drivers installed
Time has to be synchronized across all nodes and controller (ntp or
freeipa)
(If time isn't synch'ed properly the controller might not start)
My questions are:
Are the cluster requirements above correct and complete?
Can I use virtual machines without GPUs for my nodes?
(This is just to get started. Eventually I'll have real hardware with
GPUs for my nodes.)
From the Ubuntu link on your download page I've downloaded these files:
slurmctld_18.08.6.2-1_amd64.deb 610.9 kB
slurm-client_18.08.6.2-1_amd64.deb 887.7 kB
slurm-wlm_18.08.6.2-1_amd64.deb 12.3 kB
The slurmctld would be installed on my controller, but what do I install
on my nodes?
The slurm-wlm file is very small. Would I install it on my node? What
is the client for?
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191206/0ec750bf/attachment.htm>
More information about the slurm-users
mailing list