I have confirmed that the issue is Ubuntu 20.04.  I used the tmate github action to get access to the Ubuntu 20.04 github arm runner and tried the steps manually one be one.  It did indeed fail, almost immediately in the "debuild -b -uc -us” step.  Given that the same experiment done on a Ubuntu 22.04 arm EC2 instance was successful, it appears that 220.04+ is required.  I was hoping not to have to go down that road but am now looking at updating all downstream dependencies to 22.04.

If anyone can confirm/deny that 20.04 doesn’t work, I’d be interested in hearing your experience.

On Jun 14, 2024, at 9:45 AM, Christopher Harrop via slurm-users <slurm-users@lists.schedmd.com> wrote:

The commands were grouped like that because they are part of a RUN in a Dockerfile.  The build was happening on a Github Actions runner, so not so easy to just interactively run them one at a time.  But, I'm pretty confident that it was the "debuild -b -uc -us" that failed.

I have since gathered some more information.  I started an Ubuntu-22.04 EC2 arm instance (because I don't have access to an arm machine any other way) and ran the commands and they all completed and built the RPMs just fine.  My container, however, is using Ubuntu-20.04.  Unfortunately, the arm architecture is not available for the Ubuntu 20.04 AMI on EC2 (at least for me), so I was not able to do a clean test of 20.04.  I suspect it's a problem with 20.04, and that 22.04+ is required.  I can add a "mxschmitt/action-tmate@v3" github action to my CI step to try to get an interactive access to the github runner at failure time and see if I can reproduce the failure manually.  I was hoping not to update to 20.04 yet due to downstream dependencies for my container, but it looks like that might be unavoidable.

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com