<div dir="ltr"><div>Ok so a step further (I hope), but still am stuck with a non working cluster.<br></div><div><br></div><div>I managed to solve both problems above by installing two debian packages (sudo apt install mailutils libpmix-dev) on both head and compute nodes.</div><div><br></div><div>I have no errors in the two log files, but somehow the node is still drained.</div><div><br></div><div>How do I get around this please?<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 7 Nov 2023 at 17:43, JP Ebejer <<a href="mailto:jean.p.ebejer@um.edu.mt" target="_blank">jean.p.ebejer@um.edu.mt</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 7 Nov 2023 at 11:34, Diego Zuccato <<a href="mailto:diego.zuccato@unibo.it" target="_blank">diego.zuccato@unibo.it</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Il 07/11/2023 11:15, JP Ebejer ha scritto:<br>
> but on running sinfo <br>
> right after, the node is still "drained".<br>
<br>
That's not normal :(<br>
Look at the slurmd log on the node for a reason. Probably the node <br>
detects an error and sets itself to drained. Another possibility is that <br>
slurmctld detects a mismatch between the node and its config: in this <br>
case you'll find the reason in slurmctld.log .<br></blockquote><div><br></div><div>Ok great. So I clear the slurmd.log on the compute-0 node. I restart the service (after changing the logging from debug3 to verbose).</div><div><br></div><div><span style="font-family:monospace">[2023-11-07T16:34:17.575] topology/none: init: topology NONE plugin loaded<br>[2023-11-07T16:34:17.575] route/default: init: route default plugin loaded<br>[2023-11-07T16:34:17.577] task/affinity: init: task affinity plugin loaded with CPU mask 0xffffffff<br>[2023-11-07T16:34:17.578] cred/munge: init: Munge credential signature plugin loaded<br>[2023-11-07T16:34:17.578] slurmd version 22.05.8 started<span style="color:rgb(255,0,0)"><br>[2023-11-07T16:34:17.579] error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:195: pmi/pmix: can not load PMIx library<br>[2023-11-07T16:34:17.579] error: Couldn't load specified plugin name for mpi/pmix: Plugin init() callback failed<br>[2023-11-07T16:34:17.579] error: MPI: Cannot create context for mpi/pmix<br>[2023-11-07T16:34:17.580] error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:195: pmi/pmix: can not load PMIx library<br>[2023-11-07T16:34:17.580] error: Couldn't load specified plugin name for mpi/pmix_v4: Plugin init() callback failed<br>[2023-11-07T16:34:17.580] error: MPI: Cannot create context for mpi/pmix_v4</span><br>[2023-11-07T16:34:17.580] slurmd started on Tue, 07 Nov 2023 16:34:17 +0000<br>[2023-11-07T16:34:17.580] CPUs=32 Boards=1 Sockets=2 Cores=8 Threads=2 Memory=64171 TmpDisk=1031475 Uptime=87818 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)</span></div><div><span style="font-family:monospace"><br></span></div><div><span style="font-family:arial,sans-serif">I am not sure I understand this, and my MPI setting is none (so </span>MpiDefault=none<span style="font-family:arial,sans-serif">).  The jobs I intend to run do not use MPI.<br></span></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">Could this be the cause, and how do I fix this (on Debian 12)?<br></span></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">Also if I stop, truncate the log file, and start the slurmctld service I see similar errors.  Below:<br></span></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:monospace"><span style="color:rgb(255,0,0)">[2023-11-07T16:40:22.888] error: Configured MailProg is invalid</span><br>[2023-11-07T16:40:22.889] slurmctld version 22.05.8 started on cluster mycluster<br>[2023-11-07T16:40:22.890] cred/munge: init: Munge credential signature plugin loaded<br>[2023-11-07T16:40:22.892] select/cons_res: common_init: select/cons_res loaded<br>[2023-11-07T16:40:22.892] select/cons_tres: common_init: select/cons_tres loaded<br>[2023-11-07T16:40:22.892] select/cray_aries: init: Cray/Aries node selection plugin loaded<br>[2023-11-07T16:40:22.893] preempt/none: init: preempt/none loaded<br>[2023-11-07T16:40:22.894] ext_sensors/none: init: ExtSensors NONE plugin loaded<br><span style="color:rgb(255,0,0)">[2023-11-07T16:40:22.895] error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:195: pmi/pmix: can not load PMIx library<br>[2023-11-07T16:40:22.895] error: Couldn't load specified plugin name for mpi/pmix_v4: Plugin init() callback failed<br>[2023-11-07T16:40:22.895] error: MPI: Cannot create context for mpi/pmix_v4</span><br>[2023-11-07T16:40:22.899] accounting_storage/none: init: Accounting storage NOT INVOKED plugin loaded<br>[2023-11-07T16:40:22.901] No memory enforcing mechanism configured.<br>[2023-11-07T16:40:22.902] topology/none: init: topology NONE plugin loaded<br>[2023-11-07T16:40:22.904] sched: Backfill scheduler plugin loaded<br>[2023-11-07T16:40:22.904] route/default: init: route default plugin loaded<br>[2023-11-07T16:40:22.905] Recovered state of 1 nodes<br>[2023-11-07T16:40:22.905] Recovered JobId=8 Assoc=0<br>[2023-11-07T16:40:22.905] Recovered JobId=9 Assoc=0<br>[2023-11-07T16:40:22.905] Recovered JobId=10 Assoc=0<br>[2023-11-07T16:40:22.905] Recovered JobId=11 Assoc=0<br>[2023-11-07T16:40:22.905] Recovered information about 4 jobs<br>[2023-11-07T16:40:22.906] select/cons_tres: select_p_node_init: select/cons_tres SelectTypeParameters not specified, using default value: CR_Core_Memory<br>[2023-11-07T16:40:22.906] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions<br>[2023-11-07T16:40:22.906] Recovered state of 0 reservations<br>[2023-11-07T16:40:22.906] State of 0 triggers recovered<br>[2023-11-07T16:40:22.906] read_slurm_conf: backup_controller not specified<br>[2023-11-07T16:40:22.906] select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure<br>[2023-11-07T16:40:22.906] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions<br>[2023-11-07T16:40:22.906] Running as primary controller<br>[2023-11-07T16:40:22.907] No parameter for mcs plugin, default values set<br>[2023-11-07T16:40:22.907] mcs: MCSParameters = (null). ondemand set.</span><br><br><br></div><div>Is this a step closer to resolution?<br></div><div> <br></div></div><div dir="ltr" class="gmail_signature"><div dir="ltr"><table style="border-collapse:collapse;color:rgb(0,0,0);font-family:Arial;font-size:14px"><tbody><tr><td style="vertical-align:top;padding-right:28.625px"><br></td><td style="padding:0px"><br></td></tr></tbody></table></div></div></div>
</blockquote></div><br clear="all"><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><br><table style="border-collapse:collapse;color:rgb(0,0,0);font-family:Arial;font-size:14px"><tbody><tr><td style="vertical-align:top;padding-right:28.625px"><a href="https://www.um.edu.mt/" style="margin:0px;padding:0px;border:0px;outline:none;font-weight:inherit;font-style:inherit;font-family:inherit;vertical-align:baseline;color:rgb(17,85,204)" target="_blank"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-size:11pt;font-family:inherit;vertical-align:baseline;white-space:pre-wrap"><img src="https://www.um.edu.mt/__data/assets/image/0006/437856/ummalta.png" style="margin: 0px; padding: 0px; border: 0px; outline: 0px; font-weight: inherit; font-style: inherit; font-size: 14.6667px; font-family: inherit; vertical-align: baseline;" width="200" height="64"></span></a></td><td style="padding:0px"><p style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-family:inherit;vertical-align:baseline;line-height:14px"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:700;font-style:inherit;font-size:9.5pt;font-family:inherit;vertical-align:baseline;line-height:12.6667px">Prof. Jean-Paul Ebejer | Associate Professor</span></p><p style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-family:inherit;vertical-align:baseline;line-height:14px"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-size:10px;font-family:inherit;vertical-align:baseline;color:rgb(102,102,102);line-height:10px">BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.)</span><br></p><br><p style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;font-family:inherit;vertical-align:baseline;line-height:14px"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;font-size:10px;font-family:inherit;vertical-align:baseline;background-color:transparent"><b><font color="#444444">Centre for Molecular Medicine and Biobanking</font></b></span><font face="inherit"><span style="margin:0px;padding:0px;border-width:0px;outline-width:0px;font-family:inherit;vertical-align:baseline;color:rgb(102,102,102);font-weight:inherit;border-style:initial;border-color:initial;outline-color:initial;outline-style:initial;font-style:inherit"></span></font></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><font color="#666666"><span style="font-size:10px">Office 320, Biomedical Sciences Building,</span></font></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><font color="#666666"><span style="font-size:10px">University of Malta, Msida, MSD 2080.  MALTA.</span></font></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><font color="#666666"><span style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;font-size:10px"></span></font></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><font color="#666666"><span style="font-size:10px">T: (00356) 2340 3263</span></font></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><font color="#666666"><span style="font-size:10px"><br></span></font></p><p style="font-family:inherit;font-style:inherit;margin:0px;padding:0px;border:0px;outline:0px;font-size:10px;vertical-align:baseline;line-height:10px"><span style="background-color:transparent;font-family:inherit;font-style:inherit"><b><font color="#444444">Department of Artificial Intelligence</font></b></span><br></p><p style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:10px"><span style="background-color:transparent"><font color="#666666"><span style="font-size:10px">Associate Member</span></font></span></p><p style="font-family:inherit;font-style:inherit;font-weight:inherit;margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px">


        
        
        
        

</p><p style="line-height:120%;margin-bottom:0in">
<span style="font-variant:normal"><font color="#444444"><span style="text-decoration:none"><font face="Arial"><font style="font-size:7pt" size="1"><span style="font-style:normal"><span style="font-weight:normal"><span style="background:transparent">Join
the </span></span></span></font></font></span></font></span><a href="https://groups.google.com/a/um.edu.mt/g/mailinglist-bioinformatics.research" target="_blank"><span style="font-variant:normal"><font color="#1155cc"><span style="text-decoration:none"><font face="Arial"><font style="font-size:7pt" size="1"><span style="font-style:normal"><u><span style="font-weight:normal"><span style="background:transparent">Bioinformatics@UM</span></span></u></span></font></font></span></font></span></a><span style="font-variant:normal"><font color="#444444"><span style="text-decoration:none"><span style="background:transparent">
</span></span></font></span><span style="font-variant:normal"><font color="#444444"><span style="text-decoration:none"><font face="Arial"><font style="font-size:7pt" size="1"><span style="font-style:normal"><span style="font-weight:normal"><span style="background:transparent">mailing
list! <br></span></span></span></font></font></span></font></span><a href="https://bitsilla.com/blog/where-to-find-me/" target="_blank"><span style="font-variant:normal"><font color="#1155cc"><span style="text-decoration:none"><font face="Arial"><font style="font-size:7pt" size="1"><span style="font-style:normal"><u><span style="font-weight:normal"><span style="background:transparent">Where to find me</span></span></u></span></font></font></span></font></span></a></p>

<p style="font-family:inherit;font-style:inherit;font-weight:inherit;margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"></p><p style="font-family:inherit;font-style:inherit;font-weight:inherit;margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-size:10px;font-family:inherit;vertical-align:baseline;color:rgb(102,102,102);background-color:transparent"></span></p><p style="font-family:inherit;font-style:inherit;font-weight:inherit;margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;line-height:14px"><span style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:inherit;font-style:inherit;font-size:10px;font-family:inherit;vertical-align:baseline;color:rgb(102,102,102);background-color:transparent"></span></p><div><br><span></span><span></span></div><div><a href="https://twitter.com/dr_jpe" target="_blank"><img alt="https://twitter.com/dr_jpe" src="https://ci3.googleusercontent.com/mail-sig/AIorK4y44jTOkoGUgEuzcQHoGvTjQWNqWlXcg6vzM7erT5yVRTUDlvL0hNwOb5CSkYY2u3MKTdHd9KQ"></a> <a href="https://bitsilla.com/blog/" target="_blank"><img alt="https://bitsilla.com/blog/" src="https://ci3.googleusercontent.com/mail-sig/AIorK4y04H4CxkFR6tk-aTPyPNabRKv5UpC2NDtCXAPYhROykm0xls5EL9jb1WLK79giyuMcCz8wxgU"></a> <a href="https://github.com/jp-um" target="_blank"><img alt="https://github.com/jp-um" src="https://ci3.googleusercontent.com/mail-sig/AIorK4x9kLjzd7qvFEwzjTsiDYcnDaxdcimWnxn3l6_uhQbRW_5iREhQyNCQtmflFyqxrE1g0a37Xog"></a><br></div></td></tr></tbody></table></div></div>

<br>
<i>The contents of this email are subject to <b><a href="https://www.um.edu.mt/disclaimer/email/" target="_blank">these terms</a>.</b></i><br>