[slurm-users] Ubuntu Cluster with Slurm
Abhinandan Patil
abhinandan_patil_1414 at yahoo.com
Wed May 13 13:42:34 UTC 2020
Dear All,
Preamble
----------
I want to form simple cluster with three laptops:
abhi-Latitude-E6430 //This serves as the controller
abhi-Lenovo-ideapad-330-15IKB //Compute Node
abhi-HP-EliteBook-840-G2 //Compute Node
Aim
-------------
I want to make use of CPU+GPU+RAM on all the machines when I execute JAVA programs or Python programs.
Implementation
------------------------
Now let us look at the slurm.conf
On Machine abhi-Latitude-E6430
ClusterName=linux
ControlMachine=abhi-Latitude-E6430
SlurmUser=abhi
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
SwitchType=switch/none
StateSaveLocation=/tmp
MpiDefault=none
ProctrackType=proctrack/pgid
NodeName=abhi-Lenovo-ideapad-330-15IKB RealMemory=12000 CPUs=2
NodeName=abhi-HP-EliteBook-840-G2 RealMemory=14000 CPUs=2
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Same slurm.conf is copied to all the Machines.
Observations
--------------------------------------
Now when I do
abhi at abhi-HP-EliteBook-840-G2:~$ service slurmd status
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-05-13 18:50:01 IST; 1min 49s ago
Docs: man:slurmd(8)
Process: 98235 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 98253 (slurmd)
Tasks: 2
Memory: 2.2M
CGroup: /system.slice/slurmd.service
└─98253 /usr/sbin/slurmd
abhi at abhi-Lenovo-ideapad-330-15IKB:~$ service slurmd status
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-05-13 18:50:20 IST; 8s ago
Docs: man:slurmd(8)
Process: 71709 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 71734 (slurmd)
Tasks: 2
Memory: 2.0M
CGroup: /system.slice/slurmd.service
└─71734 /usr/sbin/slurmd
abhi at abhi-Latitude-E6430:~$ service slurmctld status
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-05-13 18:48:58 IST; 4min 56s ago
Docs: man:slurmctld(8)
Process: 97114 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 97116 (slurmctld)
Tasks: 7
Memory: 2.6M
CGroup: /system.slice/slurmctld.service
└─97116 /usr/sbin/slurmctld
However abhi at abhi-Latitude-E6430:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 down* abhi-Lenovo-ideapad-330-15IKB
Advice needed
------------------------
Please let me know Why I am seeing only one node.
Further how the total memory is calculated? Can Slurm make use of GPU processing power as well
Please let me know if I have missed something in configuration or explanation.
Thank you all
Best Regards,Abhinandan H. Patil, +919886406214https://www.AbhinandanHPatil.info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200513/916e9b1c/attachment.htm>
More information about the slurm-users
mailing list