[slurm-users] Bug: incorrect output directory fails silently

Dries Boers dries.boers at slu.se
Tue Jul 13 08:58:23 UTC 2021

Thank you all for your replies. I will report the bug, as it is not as visible as it should be.

I celebrated yesterday by waiting for fifteen minutes for a job to start, which had failed silently 🎉

With kind regards,

On 2021-07-08 18:10, Killian Murphy wrote:
You can't know the file system state at job runtime, but you can catch the case where the output path can't be resolved at job submission time - I expect this will catch the majority of issues (we also see this come up fairly regularly!).

On Thu, 8 Jul 2021 at 16:59, Marcus Boden <mboden at gwdg.de<mailto:mboden at gwdg.de>> wrote:
I already answered tons of tickets due to this, when our users are
confused, that the job silently fails.
The problem is, you cannot solve this with a job_submit or cli_filter,
as you do not know the situation of the file system at job runtime. Or
even on the node in the end.

At lest the slurmd gives an error, so you could scan the logs for this
error and maybe use that to automate something.


On 08.07.21 16:58, Jeffrey T Frey wrote:
>> I understand that there is no output file to write an error message to, but it might be good to check the `--output` path during the scheduling, just like `--account` is checked.
>> Does anybody know a workaround to be warned about the error?
> I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it.

Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience, HPC-Team
Tel.:   +49 (0)551 201-2191, E-Mail: mboden at gwdg.de<mailto:mboden at gwdg.de>
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: https://www.gwdg.de

Support: Tel.: +49 551 201-1523, URL: https://www.gwdg.de/support
Sekretariat: Tel.: +49 551 201-1510, Fax: -2150, E-Mail: gwdg at gwdg.de<mailto:gwdg at gwdg.de>

Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598

Zertifiziert nach ISO 9001

Killian Murphy
Research Software Engineer

Wolfson Atmospheric Chemistry Laboratories
University of York
YO10 5DD
+44 (0)1904 32 1223

e-mail disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm

När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210713/bdec1deb/attachment.htm>

More information about the slurm-users mailing list