At a certain point, you’re talking about workflow orchestration. Snakemake [1] and its slurm executor plugin [2] may be a starting point, especially since Snakemake is a local-by-default tool. I wouldn’t try reproducing your entire “make” workflow in Snakemake. Instead, I’d define the roughly 60 parallel tasks you describe among jobs A, B, and C.

 

[1] https://snakemake.github.io

[2] https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html

 

From: Duane Ellis via slurm-users <slurm-users@lists.schedmd.com>
Date: Sunday, June 9, 2024 at 9:51
AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Software builds using slurm

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.

________________________________

I have been lurking here for a while hoping to see some examples that would help but have not fit several months

We have a slurm system setup for xilnix FPGA builds (hdl) I want to use this for sw builds too

What I seem to see is slurm talks about cpus, GPUs and memory etc I am looking for a “run my make file (or shell script) on any available node”

In our case we have 3 top level jobs A B and C
These can all run in parallel and are independent (ie bootloader, linux kernel, and the Linux root file system via buildroot)

Job A (boot) is actually about 7 small builds that are independent

I am looking for a means to fork n jobs (ie job A B and C above) across the cluster and wait/collect the std output of those n jobs and the exit status

Job A would then fork and build 7 to 8 sub jobs
When they are done it would assemble the result into what Xilinix calls boot.bin

Job B is a Linux kernel build

Job C is buildroot so there are several (n=50) smaller builds ie bash, busybody, and other tools like python for the target agian each of these can be executed in parallel

Really do not (cannot) re architect my build to be a slurm only build because it also need to be able to run without slurm ie build everything on my laptop without slurm present

In that case the jobs would run serially and take an hour or so the hope is by parallelizing the sw build jobs our overall cycle time will improve

It would also be nice if the slurm cluster would adapt to the available nodes automatically

Our hope is we can run our lab pcs as duel boot they normally boot windows but we can duel boot them into Linux and they become a compile node and auto join the cluster and the cluster sees them as going off line when somebody reboots the machine back to  windows


Sent from my iPhone

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com