Hi Mick,
Thanks for the info. I already discovered below how to build from the dev branch. I hope that Michael can add some build information to the NHC page https://github.com/mej/nhc?tab=readme-ov-file#installation
/Ole
On 8/26/25 14:13, Timony, Mick via slurm-users wrote:
Hi,
I found Michael presentation useful for helping me build the dev branch. There are brief instructions in page 15 which I modified to use the dev branch. I am willing to help test a new version time premitting.
https://hpckp.org/wp-content/uploads/2022/10/13-HPCKP6-Michael- Jennings.pdf <https://eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Fhpckp.org%2Fwp-content%2Fuploads%2F2022%2F10%2F13- HPCKP6-Michael- Jennings.pdf&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128454215%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=jbPNqc5cpuwj3xOtPIm07IuVnCkz2fq3YcEgBi1Rzns%3D&reserved=0>
Kind regards,
-- Mick Timony Senior DevOps Engineer LASER, Longwood, & O2 Cluster Admin Harvard Medical School
--
*From:* Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, August 26, 2025 5:49 AM *To:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com *Subject:* [slurm-users] Re: [EXTERNAL] Node Health Check Program Hi Michael,
Thanks a lot for the hints for building an NHC RPM. I confirm that this works for me.
As I wrote below "perhaps trivial to some of you, and stumbling blocks to others" :-) Could you kindly add your instructions to https://github.com/mej/nhc?tab=readme-ov-file#installation <https:// eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Fgithub.com%2Fmej%2Fnhc%3Ftab%3Dreadme-ov- file%23installation&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128490930%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=RDxZQnpKTmjN5%2F2RB7EDFkRKRN%2FnXwKu5vCzSdFnR3o%3D&reserved=0> ?
IMHO, the below commands "git clone ... ; git switch dev" ought to be documented as well - again for the benefit of those of us not well versed in Git and GNU Autotools.
Best regards, Ole
On 8/25/25 21:50, Jennings, Michael E via slurm-users wrote:
Hi guys!
NHC builds like any other GNU Autotools-based package: |./autogen.sh <configure-args> && make dist|
That's all you need to do to generate the correct tarball. From there, | rpmbuild -ta |is one option. I use Mezzanine tools, so I just run mzbuild. So whenever I go to build new RPMs for the production teams, all I have to do is "|./autogen.sh && make dist && mzbuild"||[1], or the (mostly) equivalent "||./autogen.sh && make dist && rpmbuild -ta lbnl- nhc-1.5.tar.gz||", either of which spits out the RPM and SRPM for me.|
|Hope this helps!| |Michael|
|1: Technically, this is a lie. What I |*|actually|* run is this: "|./ autogen.sh && make distcheck && zbuild|" The "check" part ensures that everything is set up correctly for out-of-tree builds, cross compiling, etc. But that's something only I really need to worry about. The | zbuild| command is actually from yet another project; it allows me to build for multiple distributions at once by leveraging containers. But again, not something the average person would care about.
-- Michael E. Jennings (he/him) mej@lanl.gov https://hpc.lanl.gov/ <https://eur01.safelinks.protection.outlook.com/?
url=https%3A%2F%2Fhpc.lanl.gov%2F&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128508864%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CdvX%2FDaUZxIyVDaA3a2qOdGSD4UG7OptD6GNcyLRVoY%3D&reserved=0>
HPC Platform Integration Engineer - Platforms Design Team - HPC Design Group Ultra-Scale Research Center (USRC), 4200 W Jemez #301-25 +1 (505) 412-4151 Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, NM 87545-0001
*From:* Otto, Frank via slurm-users slurm-users@lists.schedmd.com *Sent:* Friday, August 22, 2025 06:20 *To:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com *Subject:* [slurm-users] Re: [EXTERNAL] Node Health Check Program Hi Ole,
this looks similar to what I've been doing for building RPMs. (It's documented for our in-house branch at [1], if anyone wants to compare.) Happy to see I'm not doing something totally stupid. :)
[1] https://github.com/UCL-ARC/nhc/blob/ucl/README.md <https://
eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Fgithub.com%2FUCL- ARC%2Fnhc%2Fblob%2Fucl%2FREADME.md&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128524273%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GzreSzjTTD7pl9ocreJSULV2Wa3dZVPZCR33Zg8aWs0%3D&reserved=0> <https://
eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FUCL- ARC%2Fnhc%2Fblob%2Fucl%2FREADME.md__%3B!!Bt8fGhp8LhKGRg! Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G- aijV5f2Er5oJkEvwXkn7lZWBQFk8CU%24&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7C1aaf58dc6c7b45fcc86808dde41389d6%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638917494830057006%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pYzn8PEVWUWdS4sb2pBaz0y0ST7180Io%2FFfHxDRHEdQ%3D&reserved=0>
Thanks, Frank
-- Dr. Frank Otto Principal Research Infrastructure Developer Advanced Research Computing Centre Univesity College London, UK
*From:* Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com *Sent:* 22 August 2025 12:17 *To:* slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com *Subject:* [slurm-users] Re: [EXTERNAL] Node Health Check Program ⚠ Caution: External sender
On 8/19/25 21:25, Jennings, Michael E via slurm-users wrote:
Have you by chance given the `dev` branch a try? All our production servers currently run `lbnl-nhc-1.5-0.82.gf8dc.el8.noarch` built from the `dev` branch, have been for some time now, and it's been rock solid. Our RHEL-based clusters also use this version. Our HPE/Cray Shasta clusters, including our largest (classified) clusters
Crossroads, Tycho, and Venado, use a variant. (Long story short, I've merged in all my changes into a separate branch, but the reverse is not yet true.) This variant is, at present, COS/SLES-specific, but it has quite a few useful additional checks (many of them Cray-centric) contributed by other LANL folks that I haven't had a chance to upstream yet.
Due to Michael's recommendation I wanted to try out the 'dev' branch version 1.5 of NHC and build an RPM package referred to by Michael.
Since I'm not a software developer, I had to figure out for myself the detailed building steps - perhaps trivial to some of you, and stumbling blocks to others. This is what I came up with:
$ git clone https://github.com/mej/nhc.git <https://
eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Fgithub.com%2Fmej%2Fnhc.git&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128540173%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=2wvR07fscoiGU8xdStgW8ylh6UjVyoNOHQXEsT0LnUE%3D&reserved=0> <https://
eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fmej%2Fnhc.git__%3B!!Bt8fGhp8LhKGRg!Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G-aijV5f2Er5oJkEvwXkn7lZWN1YpmOU%24&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7C1aaf58dc6c7b45fcc86808dde41389d6%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638917494830077572%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=cWe5qCqgwew3oFovdQ%2Fif9Ap07NIftOOwrFEdkbHubY%3D&reserved=0> $ cd nhc $ git switch dev # Switch to the 'dev' branch $ git status # Check the status $ grep nhc_version configure.ac <https:// eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Furldefense.com%2Fv3%2F__http%3A%2F%2Fconfigure.ac__%3B!! Bt8fGhp8LhKGRg!Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G- aijV5f2Er5oJkEvwXkn7lZWOzoPssU%24&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7C1aaf58dc6c7b45fcc86808dde41389d6%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638917494830092237%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ZcwAKVnyDcgzttEaDMdNB3rFdCKDmt75y8meq5DdsN8%3D&reserved=0> # Verify the 'dev' version m4_define([nhc_version], [1.5]) $ ./autogen.sh <https://eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Furldefense.com%2Fv3%2F__http%3A%2F%2Fautogen.sh__%3B!! Bt8fGhp8LhKGRg!Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G- aijV5f2Er5oJkEvwXkn7lZWNNdMUg3%24&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7C1aaf58dc6c7b45fcc86808dde41389d6%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638917494830106572%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=dpgWt79RIF96kI6kH%2BKGRuH4jD1t%2BymzwPXdPH4%2BJJs%3D&reserved=0> # Undocumented build requirement $ cd .. $ mv nhc lbnl-nhc-1.5 # Rename the source folder $ tar czf lbnl-nhc-1.5.tar.gz lbnl-nhc-1.5 $ rpmbuild -ta lbnl-nhc-1.5.tar.gz
The resulting RPM package is:
~/rpmbuild/RPMS/noarch/lbnl-nhc-1.5-0.82.gf8dc.el8.noarch.rpm
I've added those steps to my Slurm Wiki page: https://eur01.safelinks.protection.outlook.com/? <https://
eur01.safelinks.protection.outlook.com/?>
url=https%3A%2F%2Fwiki.fysik.dtu.dk%2FNiflheim_system%2FSlurm_configuration%2F%23node-health-check&data=05%7C02%7Cf.otto%40ucl.ac.uk%7C8865ec39af3241be6a7908dde16ed054%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C638914588236979158%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=bX%2FuNDPVHjspnWZ3c%2FA4CpW61xRHCfS8OmrdDkOG0CQ%3D&reserved=0 <https://urldefense.com/v3/__https://wiki.fysik.dtu.dk/Niflheim_system/
Slurm_configuration/*node-health-check__;Iw!!Bt8fGhp8LhKGRg! Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G- aijV5f2Er5oJkEvwXkn7lZWEotkh1J$ <https:// eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fwiki.fysik.dtu.dk%2FNiflheim_system%2FSlurm_configuration%2F*node-health-check__%3BIw!!Bt8fGhp8LhKGRg!Eyq9zb9zvoBb5gvwaNtYh349GtBEk3URU13HkwJtgMOAtzseCrup2js9G-aijV5f2Er5oJkEvwXkn7lZWEotkh1J%24&data=05%7C02%7Cole.h.nielsen%40fysik.dtu.dk%7Cc32dd3aa6df54612839108dde49c4a40%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638918082128559643%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=JuxNiuFsMRNU96sZD5KpoZASmIVfXlyNtuZfcR3gHQc%3D&reserved=0>>
Any comments?