<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">FYI, after more internet sleuthing (searching for “juju slurm”) I came across this outstanding looking project: Omnivector Slurm Distribution (OSD):
<a href="https://omnivector-solutions.github.io/osd-documentation/master/index.html">
https://omnivector-solutions.github.io/osd-documentation/master/index.html</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This project uses Juju (Canonical project) to deploy, configure and manage a Slurm cluster along with a variety of other components, like SlurmREST API, Prometheus integration , log forwarding via Fluentbit
to Graylog and others<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Deployment targets include cloud AWS/Openstack, local LXD, MAAS for baremetal…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I’ve only started to play with OSD, but it looks like a great framework for deploying Slurm clusters.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Quick install on an Ubuntu 22.04LTS host:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">sudo snap install juju --classic<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">sudo snap install lxd<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">lxd init --auto<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">lxc network set lxdbr0 ipv6.address none<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">sudo ufw allow 8443/tcp<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">juju bootstrap --show-log localhost<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Followed by a quick test of sinfo:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">juju run --unit slurmctld/0 "sinfo"<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">PARTITION AVAIL TIMELIMIT NODES STATE NODELIST<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">osd-slurmd up infinite 1 down* juju-65df3d-2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">juju run --unit slurmctld/0 "sinfo -R"<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">REASON USER TIMESTAMP NODELIST<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">New node slurm 2023-03-15T01:21:21 juju-65df3d-2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Mike<o:p></o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Hanby, Mike <mhanby@uab.edu><br>
<b>Date: </b>Wednesday, February 15, 2023 at 1:51 PM<br>
<b>To: </b>slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com><br>
<b>Subject: </b>[slurm-users] Running Containerized Slurmctld and Slurmdb in Production?<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">Howdy,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">Just wondering if any sites are running containerized Slurmctld and Slurmdbd in production?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">We are in the process of planning migrating from a single host running slurmctld, slurmdbd, and MySQL (and other HPC services) to separate OpenStack VMs. Our site averages
less than 1000’s running / pending jobs at any given time. Like many HPC sites, our jobs are a mix of long running, large arrays, very short…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">I ran across this Github project “<b>Slurm Docker Cluster”</b>
<a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgiovtorres%2Fslurm-docker-cluster&data=05%7C01%7Cmhanby%40uab.edu%7C6dd0fbb8a506499d329308db0f85b1f9%7Cd8999fe476af40b3b4351d8977abc08c%7C1%7C0%7C638120839125275887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Wt%2Fc%2BdpX5xMFtTn47aZOPF%2BELV7H0mb%2Fb4Eib9atgaI%3D&reserved=0">
https://github.com/giovtorres/slurm-docker-cluster</a> and got me thinking that this method might be great for simpler upgrades, ease of reproducing the cluster in development, etc…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">How about it, anyone running containerized Slurm server processes in production?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-ligatures:standardcontextual">Thanks, Mike<o:p></o:p></span></p>
</div>
</body>
</html>