[slurm-users] slurm and singularity
Groner, Rob
rug262 at psu.edu
Tue Feb 7 21:52:36 UTC 2023
Looks like we can go the route of a wrapper script, since our users don't specifically need to know they're running an sbatch. Thanks for the suggestion.
The remaining issue then is how to put them into an allocation that is actually running a singularity container. I don't get how what I'm doing now is resulting in an allocation where I'm in a container on the submit node still!
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Brian Andrus <toomuchit at gmail.com>
Sent: Tuesday, February 7, 2023 12:52 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] slurm and singularity
You should have the job script itself have the singularity/apptainer command.
I am guessing you don't want your users to have to deal with that part for their scripts, so I would suggest using a wrapper script.
You could just have them run something like: cluster_run.sh <path_to_script>
Then cluster_run.sh would call sbatch along with the appropriate commands.
Brian Andrus
On 2/7/2023 9:31 AM, Groner, Rob wrote:
I'm trying to setup the capability where a user can execute:
$: sbatch <parameters> script_to_run.sh
and the end result is that a job is created on a node, and that job will execute "singularity exec <path to container> script_to_run.sh"
Also, that they could execute:
$: salloc <parameters>
and would end up on a node per their parameters, and instead of a bash prompt, they have the singularity prompt because they're inside a running container.
Oddly, I ran: salloc <parameters> /usr/bin/singularity shell <path to sif> and that allocated and said the node was ready and gave me an apptainer prompt...cool! But when I asked it what hostname I was on, I was NOT on the node that it had said was ready, I was still on the submit node. When I exit out of the apptainer shell, it ends my allocation. Sooo...it gave me the allocation and started the apptainer shell, but somehow I was still on the submit node.
As far as the job, I've done some experiments with using job_submit.lua to replace the script with one that has a singularity call in it instead, and that might hold some promise. But I'd have to write the passed-in script to a temp file or something, and then have singularity exec that. That MIGHT work.
The results for "slurm and singularity" do not describe what I'm trying to do. The closest thing I can find is what slurm touts on their website, a leftover from Slurm 2017 talking about a spank plugin that, as near as I can figure, doesn't exist. I read through the OCI docs on the slurm website, but it shows that using singularity with that requires all commands to have sudo. That's not going to work.
I'm running out of ideas here.
Thanks,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230207/9f74fff4/attachment.htm>
More information about the slurm-users
mailing list