[slurm-users] How to deal with jobs that need to be restarted several time

Selch, Brigitte (FIDF) Brigitte.Selch at man.eu
Wed Mar 13 13:54:21 UTC 2019


Hello,

Jeah, that's it.
I can use salloc, instead of sbatch.
The user can test and run the job within this interactive slurm allocation.

 Thank you

Brigitte Selch

-----Ursprüngliche Nachricht-----
Von: slurm-users <slurm-users-bounces at lists.schedmd.com> Im Auftrag von Renfro, Michael
Gesendet: Dienstag, 12. März 2019 15:33
An: Slurm User Community List <slurm-users at lists.schedmd.com>
Betreff: Re: [slurm-users] How to deal with jobs that need to be restarted several time

If the failures happen right after the job starts (or close enough), I’d use an interactive session with srun (or some other wrapper that calls srun, such as fisbatch).

Our hpcshell wrapper for srun is just a bash function:

=====

hpcshell ()
{
    srun --partition=interactive $@ --pty bash -i }

=====

The interactive partition argument is optional, but we use it as a time- and resource-limited partition with a higher priority. I always recommend our users to develop and debug with interactive jobs, and only submit the full production job with sbatch after all the easy bugs have been identified.

--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University

> On Mar 12, 2019, at 9:26 AM, Selch, Brigitte (FIDF) <Brigitte.Selch at man.eu> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> Hello,
>
> Some jobs have to be restarted several times until they run.
> Users start the Job, it fails, they have to do some changes, they
> start the job again, it fails again … and so on.
>
> So they want to keep the resources until the job is running properly.
>
> Is there a possibility to ‘inherit’ allocated resources from one job
> to the next.
>
> Or something else to do the job?
>
> All our jobs are submitted with sbatch
>
> Thank you,
> Brigitte Selch
>
>
>
> Mit freundlichen Grüßen,
> Brigitte Selch
>
> MAN Truck & Bus AG
> IT Produktentwicklung Simulation (FIDF) Vogelweiher Str. 33
> 90441 Nürnberg
>
> Telefon +49 911 420 6056
> Brigitte.Selch at man.eu
>
>
>
> MAN Truck & Bus AG
> Sitz der Gesellschaft: München
> Registergericht: Amtsgericht München, HRB 86963 Vorsitzender des
> Aufsichtsrates: Andreas Renschler
> Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr.
> Carsten Intra, Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg,
> Dr. Frederik Zohm
>
> You can find information about how we process your personal data and
> your rights in our data protection notice:
> www.man.eu/data-protection-notice
>
> This e-mail (including any attachments) is confidential and may be privileged.
> If you have received it by mistake, please notify the sender by e-mail and delete this message from your system.
> Any unauthorised use or dissemination of this e-mail in whole or in part is strictly prohibited.
> Please note that e-mails are susceptible to change.
> MAN Truck & Bus AG (including its group companies) shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt.
> MAN Truck & Bus AG (or its group companies) does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


________________________________

MAN Truck & Bus AG
Sitz der Gesellschaft: München
Registergericht: Amtsgericht München, HRB 86963
Vorsitzender des Aufsichtsrates: Andreas Renschler
Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten Intra, Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm

You can find information about how we process your personal data and your rights in our data protection notice: www.man.eu/data-protection-notice

This e-mail (including any attachments) is confidential and may be privileged.
If you have received it by mistake, please notify the sender by e-mail and delete this message from your system.
Any unauthorised use or dissemination of this e-mail in whole or in part is strictly prohibited.
Please note that e-mails are susceptible to change.
MAN Truck & Bus AG (including its group companies) shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt.
MAN Truck & Bus AG (or its group companies) does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.



More information about the slurm-users mailing list