I have been lurking here for a while hoping to see some examples that would help but have not fit several months
We have a slurm system setup for xilnix FPGA builds (hdl) I want to use this for sw builds too
What I seem to see is slurm talks about cpus, GPUs and memory etc I am looking for a “run my make file (or shell script) on any available node”
In our case we have 3 top level jobs A B and C These can all run in parallel and are independent (ie bootloader, linux kernel, and the Linux root file system via buildroot)
Job A (boot) is actually about 7 small builds that are independent
I am looking for a means to fork n jobs (ie job A B and C above) across the cluster and wait/collect the std output of those n jobs and the exit status
Job A would then fork and build 7 to 8 sub jobs When they are done it would assemble the result into what Xilinix calls boot.bin
Job B is a Linux kernel build
Job C is buildroot so there are several (n=50) smaller builds ie bash, busybody, and other tools like python for the target agian each of these can be executed in parallel
Really do not (cannot) re architect my build to be a slurm only build because it also need to be able to run without slurm ie build everything on my laptop without slurm present
In that case the jobs would run serially and take an hour or so the hope is by parallelizing the sw build jobs our overall cycle time will improve
It would also be nice if the slurm cluster would adapt to the available nodes automatically
Our hope is we can run our lab pcs as duel boot they normally boot windows but we can duel boot them into Linux and they become a compile node and auto join the cluster and the cluster sees them as going off line when somebody reboots the machine back to windows
Sent from my iPhone
You have two options for managing those dependencies, as I see it)
1. you use SLURM’s native job dependencies, but this requires you to create a build script for SLURM 2. You use make to submit the jobs, and take advantage of the -j flag to make it run lots of tasks at once, just use a job starter prefix to prefix tasks you want run under SLURM with srun
The first approach will get the jobs run soonest. The second approach is a bit of a hack, and it means that the dependent jobs don’t get submitted until the previous jobs have finished, which isn’t ideal, but it does work, and it meets your requirement of having a single build process that works both with and without SLURM:
JOBSTARTER=srun -c 1 -t 00:05:00
SLEEP=60
all: jobC.out
clean:
rm -f job[ABC].out
jobA.out:
$(JOBSTARTER) sh -c "sleep $(SLEEP); echo done > $@"
jobB.out:
$(JOBSTARTER) sh -c "sleep $(SLEEP); echo done > $@"
jobC.out: jobA.out jobB.out
$(JOBSTARTER) sh -c "echo done > $@"
When you want to run it interactively, you set JOBSTARTER to be empty, otherwise you use some suitable srun command to run the tasks under SLURM, and the above makefile does this:
$ make -j
srun -c 1 -t 00:01:00 sh -c "sleep 60; echo done > jobA.out"
srun -c 1 -t 00:01:00 sh -c "sleep 60; echo done > jobB.out"
srun: job 13324201 queued and waiting for resources
srun: job 13324202 queued and waiting for resources
srun: job 13324201 has been allocated resources
srun: job 13324202 has been allocated resources
srun -c 1 -t 00:01:00 sh -c "echo done > jobC.out"
srun: job 13324220 queued and waiting for resources
srun: job 13324220 has been allocated resources
Regards,
Tim
-- Tim Cutts Scientific Computing Platform Lead AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Cataloguehttps://azcollaboration.sharepoint.com/sites/CMU993 |
From: Duane Ellis via slurm-users slurm-users@lists.schedmd.com Date: Sunday, 9 June 2024 at 15:50 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Software builds using slurm I have been lurking here for a while hoping to see some examples that would help but have not fit several months
We have a slurm system setup for xilnix FPGA builds (hdl) I want to use this for sw builds too
What I seem to see is slurm talks about cpus, GPUs and memory etc I am looking for a “run my make file (or shell script) on any available node”
In our case we have 3 top level jobs A B and C These can all run in parallel and are independent (ie bootloader, linux kernel, and the Linux root file system via buildroot)
Job A (boot) is actually about 7 small builds that are independent
I am looking for a means to fork n jobs (ie job A B and C above) across the cluster and wait/collect the std output of those n jobs and the exit status
Job A would then fork and build 7 to 8 sub jobs When they are done it would assemble the result into what Xilinix calls boot.bin
Job B is a Linux kernel build
Job C is buildroot so there are several (n=50) smaller builds ie bash, busybody, and other tools like python for the target agian each of these can be executed in parallel
Really do not (cannot) re architect my build to be a slurm only build because it also need to be able to run without slurm ie build everything on my laptop without slurm present
In that case the jobs would run serially and take an hour or so the hope is by parallelizing the sw build jobs our overall cycle time will improve
It would also be nice if the slurm cluster would adapt to the available nodes automatically
Our hope is we can run our lab pcs as duel boot they normally boot windows but we can duel boot them into Linux and they become a compile node and auto join the cluster and the cluster sees them as going off line when somebody reboots the machine back to windows
Sent from my iPhone
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com ________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com
At a certain point, you’re talking about workflow orchestration. Snakemake [1] and its slurm executor plugin [2] may be a starting point, especially since Snakemake is a local-by-default tool. I wouldn’t try reproducing your entire “make” workflow in Snakemake. Instead, I’d define the roughly 60 parallel tasks you describe among jobs A, B, and C.
[1] https://snakemake.github.io [2] https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm....
From: Duane Ellis via slurm-users slurm-users@lists.schedmd.com Date: Sunday, June 9, 2024 at 9:51 AM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Software builds using slurm External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
________________________________
I have been lurking here for a while hoping to see some examples that would help but have not fit several months
We have a slurm system setup for xilnix FPGA builds (hdl) I want to use this for sw builds too
What I seem to see is slurm talks about cpus, GPUs and memory etc I am looking for a “run my make file (or shell script) on any available node”
In our case we have 3 top level jobs A B and C These can all run in parallel and are independent (ie bootloader, linux kernel, and the Linux root file system via buildroot)
Job A (boot) is actually about 7 small builds that are independent
I am looking for a means to fork n jobs (ie job A B and C above) across the cluster and wait/collect the std output of those n jobs and the exit status
Job A would then fork and build 7 to 8 sub jobs When they are done it would assemble the result into what Xilinix calls boot.bin
Job B is a Linux kernel build
Job C is buildroot so there are several (n=50) smaller builds ie bash, busybody, and other tools like python for the target agian each of these can be executed in parallel
Really do not (cannot) re architect my build to be a slurm only build because it also need to be able to run without slurm ie build everything on my laptop without slurm present
In that case the jobs would run serially and take an hour or so the hope is by parallelizing the sw build jobs our overall cycle time will improve
It would also be nice if the slurm cluster would adapt to the available nodes automatically
Our hope is we can run our lab pcs as duel boot they normally boot windows but we can duel boot them into Linux and they become a compile node and auto join the cluster and the cluster sees them as going off line when somebody reboots the machine back to windows
Sent from my iPhone
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com