[slurm-users] Bug: incorrect output directory fails silently

Dries Boers dries.boers at slu.se
Wed Jul 14 09:21:33 UTC 2021


Dear all,

I have now created a bug report at Bug 12017<https://bugs.schedmd.com/show_bug.cgi?id=12017> - Incorrect output directory fails silently<https://bugs.schedmd.com/show_bug.cgi?id=12017>.

Apparently, the bug has already been reported several times in the past. But it has still been in my way and I think it should be able to implement a solution or workaround and to generally communicate better about this bug with (new) users.


With kind regards,
Dries Boers

On 2021-07-13 10:58, Dries Boers wrote:
Thank you all for your replies. I will report the bug, as it is not as visible as it should be.

I celebrated yesterday by waiting for fifteen minutes for a job to start, which had failed silently 🎉


With kind regards,
Dries

On 2021-07-08 18:10, Killian Murphy wrote:
You can't know the file system state at job runtime, but you can catch the case where the output path can't be resolved at job submission time - I expect this will catch the majority of issues (we also see this come up fairly regularly!).

On Thu, 8 Jul 2021 at 16:59, Marcus Boden <mboden at gwdg.de<mailto:mboden at gwdg.de>> wrote:
I already answered tons of tickets due to this, when our users are
confused, that the job silently fails.
The problem is, you cannot solve this with a job_submit or cli_filter,
as you do not know the situation of the file system at job runtime. Or
even on the node in the end.

At lest the slurmd gives an error, so you could scan the logs for this
error and maybe use that to automate something.

Best,
Marcus

On 08.07.21 16:58, Jeffrey T Frey wrote:
>> I understand that there is no output file to write an error message to, but it might be good to check the `--output` path during the scheduling, just like `--account` is checked.
>>
>> Does anybody know a workaround to be warned about the error?
>
> I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it.
>
>

--
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience, HPC-Team
Tel.:   +49 (0)551 201-2191, E-Mail: mboden at gwdg.de<mailto:mboden at gwdg.de>
-------------------------------------------------------------------------
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: https://www.gwdg.de

Support: Tel.: +49 551 201-1523, URL: https://www.gwdg.de/support
Sekretariat: Tel.: +49 551 201-1510, Fax: -2150, E-Mail: gwdg at gwdg.de<mailto:gwdg at gwdg.de>

Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598

Zertifiziert nach ISO 9001
-------------------------------------------------------------------------



--
Killian Murphy
Research Software Engineer

Wolfson Atmospheric Chemistry Laboratories
University of York
Heslington
York
YO10 5DD
+44 (0)1904 32 1223

e-mail disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210714/378d07dd/attachment-0001.htm>


More information about the slurm-users mailing list