Thanks Davide,
It's true that srun will create an allocation if you aren't inside a job,
but if you are inside a job and you request more resources than it has,
then srun will just fail. This is the key issue that I want to avoid.
On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento <davide.quantum(a)gmail.com>
wrote:
> The plain srun is probably the best bet, and if you really need the thing
> to be started from another slurm job (rather than the login node) you will
> need to exploit the fact that
>
> > If necessary, srun will first create a resource allocation in which to
> run the parallel job.
>
> AFAIK, there is no option to for the "create a resource allocation" even
> if it's not necessary. But you may try to request something that is "above
> and beyond" what the current allocation provides, and that might solve your
> problem.
> Looking at the srun man page, I could speculate that --clusters
> or --cluster-constraint might help in that regard (but I am not sure).
>
> Have a nice weekend
>
>
> On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users <
> slurm-users(a)lists.schedmd.com> wrote:
>
>> I'm helping with a workflow manager that needs to submit Slurm jobs. For
>> logging and management reasons, the job (e.g. srun python) needs to be run
>> as though it were a regular subprocess (python):
>>
>> - stdin, stdout and stderr for the command should be connected to
>> process inside the job
>> - signals sent to the command should be sent to the job process
>> - We don't want to use the existing job allocation, if this is run
>> from a Slurm job
>> - The command should only terminate when the job is finished, to
>> avoid us needing to poll Slurm
>>
>> We've tried:
>>
>> - sbatch --wait, but then SIGTERM'ing the process doesn't kill the job
>> - salloc, but that requires a TTY process to control it (?)
>> - salloc srun seems to mess with the terminal when it's killed,
>> likely because of being "designed to be executed in the foreground"
>> - Plain srun re-uses the existing Slurm allocation, and specifying
>> resources like --mem will just request then from the current job rather
>> than submitting a new one
>>
>> What is the best solution here?
>>
>> --
>> slurm-users mailing list -- slurm-users(a)lists.schedmd.com
>> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com
>>
>