[slurm-users] (no subject)

Dean Schulze dean.w.schulze at gmail.com
Fri Dec 6 21:57:59 UTC 2019


I'm doing my first slurm installation.  The schedmd docs assume that I have
a cluster that meets certain (unstated) requirements available, but I
don't.  I've found a couple of examples showing how to setup a cluster for
slurm using real hardware (nodes) with GPUs:

  https://github.com/mknoxnv/ubuntu-slurm
  https://github.com/nateGeorge/slurm_gpu_ubuntu

The requirements for a cluster for slurm seem to be:

  Passwordless SSH is working between slurm controller and slurm nodes
  There is shared storage between all the nodes: /storage & /home (NFS)
  The UIDs and GIDs will be consistent between all the nodes. (LDAP or
other)
  Hostnames have to be a FQDN.
  Slurm will be used to control SSH access to compute nodes.
  Compute nodes are DNS resolvable.
  Compute nodes have GPUs and the latest CUDA drivers installed
  Time has to be synchronized across all nodes and controller (ntp or
freeipa)
  (If time isn't synch'ed properly the controller might not start)


My questions are:

  Are the cluster requirements above correct and complete?

  Can I use virtual machines without GPUs for my nodes?
  (This is just to get started.  Eventually I'll have real hardware with
GPUs for my nodes.)

  From the Ubuntu link on your download page I've downloaded these files:

    slurmctld_18.08.6.2-1_amd64.deb      610.9 kB
    slurm-client_18.08.6.2-1_amd64.deb   887.7 kB
    slurm-wlm_18.08.6.2-1_amd64.deb      12.3 kB

  The slurmctld would be installed on my controller, but what do I install
on my nodes?
  The slurm-wlm file is very small.  Would I install it on my node?  What
is the client for?

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191206/0ec750bf/attachment.htm>


More information about the slurm-users mailing list