[slurm-users] Bug: incorrect output directory fails silently

Killian Murphy killian.murphy at york.ac.uk
Thu Jul 8 16:10:28 UTC 2021


You can't know the file system state at job runtime, but you can catch the
case where the output path can't be resolved at job submission time - I
expect this will catch the majority of issues (we also see this come up
fairly regularly!).

On Thu, 8 Jul 2021 at 16:59, Marcus Boden <mboden at gwdg.de> wrote:

> I already answered tons of tickets due to this, when our users are
> confused, that the job silently fails.
> The problem is, you cannot solve this with a job_submit or cli_filter,
> as you do not know the situation of the file system at job runtime. Or
> even on the node in the end.
>
> At lest the slurmd gives an error, so you could scan the logs for this
> error and maybe use that to automate something.
>
> Best,
> Marcus
>
> On 08.07.21 16:58, Jeffrey T Frey wrote:
> >> I understand that there is no output file to write an error message to,
> but it might be good to check the `--output` path during the scheduling,
> just like `--account` is checked.
> >>
> >> Does anybody know a workaround to be warned about the error?
> >
> > I would make a feature request of SchedMD to fix the issue, then I would
> write a cli_filter plugin to validate the --output/--error/--input paths as
> desired until Slurm itself handles it.
> >
> >
>
> --
> Marcus Vincent Boden, M.Sc.
> Arbeitsgruppe eScience, HPC-Team
> Tel.:   +49 (0)551 201-2191, E-Mail: mboden at gwdg.de
> -------------------------------------------------------------------------
> Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
> Am Faßberg 11, 37077 Göttingen, URL: https://www.gwdg.de
>
> Support: Tel.: +49 551 201-1523, URL: https://www.gwdg.de/support
> Sekretariat: Tel.: +49 551 201-1510, Fax: -2150, E-Mail: gwdg at gwdg.de
>
> Geschäftsführer: Prof. Dr. Ramin Yahyapour
> Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
> Sitz der Gesellschaft: Göttingen
> Registergericht: Göttingen, Handelsregister-Nr. B 598
>
> Zertifiziert nach ISO 9001
> -------------------------------------------------------------------------
>
>

-- 
Killian Murphy
Research Software Engineer

Wolfson Atmospheric Chemistry Laboratories
University of York
Heslington
York
YO10 5DD
+44 (0)1904 32 1223

e-mail disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210708/723bee2b/attachment-0001.htm>


More information about the slurm-users mailing list