[slurm-users] Ubuntu Cluster with Slurm

Abhinandan Patil abhinandan_patil_1414 at yahoo.com
Wed May 13 13:42:34 UTC 2020


Dear All,

Preamble
----------
I want to form simple cluster with three laptops:
abhi-Latitude-E6430  //This serves as the controller
abhi-Lenovo-ideapad-330-15IKB //Compute Node
abhi-HP-EliteBook-840-G2 //Compute Node


Aim
-------------
I want to make use of CPU+GPU+RAM on all the machines when I execute JAVA programs or Python programs.


Implementation
------------------------
Now let us look at the slurm.conf

On Machine abhi-Latitude-E6430

ClusterName=linux
ControlMachine=abhi-Latitude-E6430
SlurmUser=abhi
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
SwitchType=switch/none
StateSaveLocation=/tmp
MpiDefault=none
ProctrackType=proctrack/pgid
NodeName=abhi-Lenovo-ideapad-330-15IKB RealMemory=12000 CPUs=2
NodeName=abhi-HP-EliteBook-840-G2 RealMemory=14000 CPUs=2
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

Same slurm.conf is copied to all the Machines.


Observations
--------------------------------------
Now when I do
abhi at abhi-HP-EliteBook-840-G2:~$ service slurmd status
● slurmd.service - Slurm node daemon
     Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2020-05-13 18:50:01 IST; 1min 49s ago
       Docs: man:slurmd(8)
    Process: 98235 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 98253 (slurmd)
      Tasks: 2
     Memory: 2.2M
     CGroup: /system.slice/slurmd.service
             └─98253 /usr/sbin/slurmd

abhi at abhi-Lenovo-ideapad-330-15IKB:~$ service slurmd status
● slurmd.service - Slurm node daemon
     Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2020-05-13 18:50:20 IST; 8s ago
       Docs: man:slurmd(8)
    Process: 71709 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 71734 (slurmd)
      Tasks: 2
     Memory: 2.0M
     CGroup: /system.slice/slurmd.service
             └─71734 /usr/sbin/slurmd

abhi at abhi-Latitude-E6430:~$ service slurmctld status 
● slurmctld.service - Slurm controller daemon
     Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2020-05-13 18:48:58 IST; 4min 56s ago
       Docs: man:slurmctld(8)
    Process: 97114 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 97116 (slurmctld)
      Tasks: 7
     Memory: 2.6M
     CGroup: /system.slice/slurmctld.service
             └─97116 /usr/sbin/slurmctld

             
However  abhi at abhi-Latitude-E6430:~$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1  down* abhi-Lenovo-ideapad-330-15IKB


Advice needed
------------------------
Please let me know Why I am seeing only one node. 
Further how the total memory is calculated? Can Slurm make use of GPU processing power as well
Please let me know if I have missed something in configuration or explanation.

Thank you all

Best Regards,Abhinandan H. Patil, +919886406214https://www.AbhinandanHPatil.info

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200513/916e9b1c/attachment.htm>


More information about the slurm-users mailing list